Here are three routines to output a byte in decimal (it is easy to expand the routine for 16-bit and larger numbers). It is faster than using a divide by 10 routine or a subtract powers of 10 routine. It is essentially a subtract by 2^A * 10^B routine, but uses a only short table (one entry for each power of 10). Enter with the number to output in the accumulator.

- OUTDEC8Z outputs leading zeros
- OUTDEC8S outputs spaces instead of leading zeros
- OUTDEC8 does not output leading zeros

```
OUTDEC8Z
LDX #2
LDY #$4C
.1 STY .B
LSR
.2 ROL
BCS .3
CMP .A,X
BCC .4
.3 SBC .A,X
SEC
.4 ROL .B
BCC .2
TAY
LDA .B
JSR OUTPUT
TYA
LDY #$13
DEX
BPL .1
RTS
.A DB 128,160,200
.B DS 1
```

```
OUTDEC8
LDX #1
STX .C
INX
LDY #$40
.1 STY .B
LSR
.2 ROL
BCS .3
CMP .A,X
BCC .4
.3 SBC .A,X
SEC
.4 ROL .B
BCC .2
TAY
CPX .C
LDA .B
BCC .5
BEQ .6
STX .C
.5 EOR #$30
JSR OUTPUT
.6 TYA
LDY #$10
DEX
BPL .1
RTS
.A DB 128,160,200
.B DS 1
.C DS 1
```

```
OUTDEC8S
LDX #1
STX .C
INX
LDY #$40
.1 STY .B
LSR
.2 ROL
BCS .3
CMP .A,X
BCC .4
.3 SBC .A,X
SEC
.4 ROL .B
BCC .2
TAY
CPX .C
LDA .B
BCC .5
BEQ .6
STX .C
.5 EOR #$10
.6 EOR #$20
JSR OUTPUT
TYA
LDY #$10
DEX
BPL .1
RTS
.A DB 128,160,200
.B DS 1
.C DS 1
```

## How it works

The basic algorithm (in C-ish pseudocode) is this:

```
digit = 0
if (a >= 200) a = a - 200; digit = digit + 2
a = a*1; if (a >= 100) a = a - 100; digit = digit + 1
output character (digit ^ $30)
digit = 0
if (a >= 80) a = a - 80; digit = digit + 8
a = a*1; if (a >= 40) a = a - 40; digit = digit + 4
a = a*1; if (a >= 20) a = a - 20; digit = digit + 2
a = a*1; if (a >= 10) a = a - 10; digit = digit + 1
output character (digit ^ $30)
digit = 0
if (a >= 8) a = a - 8; digit = digit + 8
a = a*1; if (a >= 4) a = a - 4; digit = digit + 4
a = a*1; if (a >= 2) a = a - 2; digit = digit + 2
a = a*1; if (a >= 1) a = a - 1; digit = digit + 1
output character (digit ^ $30)
```

Instead of a bunch of IF statements with decreasing constants for comparison and subtraction, the accumulator is shifted left (i.e. replace all the a=a*1 statements with a=a*2 and then think about what you would compare to and subtract in that instance); instead of adding 8,4,2, or 1 to digit; the bits are rotated left into digit (i.e. .B). .B is used for 2 purposes: (1) to control whether to loop 2 or 4 times and (2) to collect digit bits. The Y register values were chosen so that the $30 was pre-XOR'ed-in. When it gets to the tens digit, the accumulator has already been shifted left once, so 160 is used instead of 80 for comparison and subtraction. Likewise, when it gets to the ones digit, the accumulator has already been shifted left four times, so 128 is used instead of 8 for comparison and subtraction.

Other sneakiness:

The LSR is used as a 1-byte substitute for a JMP to the CMP instruction; the ROL will undo the LSR and the BCS .3 won't be taken. In subsequent iterations of the loop, the ROL is the same as an ASL, since the branch to .2 is taken with the carry clear (BCC .2).

The purpose of BCS .3 is to make sure that the subtraction takes place if shifting left produced a result greater than 255. Example: OUTDEC8Z is called with A=$80; the first time at .2, ASL will make A=$80 and clear the carry, so the BCS will not be taken, and the CMP will clear the carry ($80 is less than 200), and the BCC .4 is taken and A remains $80. The second time at .2, ASL will make A=$00; without the BCS .3, the CMP would clear the carry (since $00 is less than 200). The UM/MOD bug described in Division (32-bit) in the source code respository deals with the same phenomenon, i.e. shifting out a bit into the carry that must subsequently be accounted for (another implementation is shown in the FIG-Forth errata).

The SEC might seem to be unnecessary at first blush; if the BCC .4 isn't taken, the SBC will leave the carry set, since the carry was set from the CMP (it's the same subtraction in both instances). However, if the BCS .3 is taken, the SBC clears the carry. Again, consider the earlier example; the second time at .2, the BCS .3 is taken, and the SBC makes A=$38 and clears the carry; however a one should be shifted into .B.

In OUTDEC8 and OUTDEC8S, .C is used to keep track of when to output zeros. .C is initialized to 1 meaning always output a zero in the ones digit position. If a non-zero digit (which are always output) is encountered, .C is set to the current digit position; meaning any subsequent zeros will be output. Since it's necessary to test whether a digit is zero the $30 is not pre-XOR'ed (and in OUTDEC8S you get $30 from the EOR #$10 EOR #$20).