Introduction and 65C02 code
Token threading is a technique of representing code (or code sequences) in a compact form, at the expense of some speed. Bytes are (usually) read sequentially, with each byte representing some sort of action (e.g. multiply) and in some cases followed by data. The tokenized code is often position independent so that it can be loaded at, and executed from, any address. Token threading (or something very similar to it) has been used to implement interpreters for languages such as BASIC.
The 6502 cannot, of course, execute the tokens directly, so it is necessary to write a routine which interprets the token. On a 65C02, the JMP (abs,X) instruction is helpful for this.
INTERPRET
.1 JSR .2
JMP .1
.2 JSR GET
ASL
TAX
BCS .3 ; if token >= 128, use the second half of the table
JMP (JUMP_TABLE,X)
.3 JMP (JUMP_TABLE+256,X)
JUMP_TABLE
DW TOKEN_00_HANDLER
DW TOKEN_01_HANDLER
;etc.
DW TOKEN_FE_HANDLER
DW TOKEN_FF_HANDLER
You do not need to implement all 256 tokens, of course.
One thing to note is the fact that the JSR .2 is the first instruction. (In many interpreters the JSR is right before the JMP (abs,X) rather than at the beginning.) This allows a token handler to return with RTS. (It's often useful for one token handler to JSR to another; e.g. a "calculate offset of 2-D array index" token handler might wish to call a multiply token handler.) It also allows token handlers to "return" with a JMP to INTERPRET.2 (note INTERPRET.2 not INTERPRET) in those cases where a little extra speed is desired.
GET simply gets the next token, advancing the Interpretation Pointer (IP) as it goes. The (zp) addressing mode available on the 65C02 is helpful for this.
GET
INC IP
BNE .1
INC IP+1
.1 LDA (IP)
RTS
Notice that IP is incremented before the LDA. This is because it is often useful to return with the N and Z flags reflecting the value of the byte just read. If the LDA were before the increment, the INC instruction(s) would overwrite the N and Z flags. (However, an ORA #0 instruction could be placed before the RTS to update the N and Z flags.)
Also worth noting is that you are not limited to 256 tokens. One way to do this is for one of the token handlers (e.g. token $00) to simply get the next byte and treat that as token, giving you 256 more tokens. The token $00 handler would look like INTERPRET, but with it's own, separate, jump table. Token $00 can be thought of as a prefix byte, or an escape byte, in this instance.
Another way is to get the next two bytes and simply jump to that address, like so:
TOKEN_00_HANDLER
JSR GET ;get hi byte
PHA
JSR GET ;get lo byte
PHA
RTS ;"jump" to address+1 since we're using RTS rather than JMP
Of course, the most commonly used tokens should be single byte tokens, as these will be faster and take less space.
If 128 or fewer tokens are needed, INTERPRET can be simplified by using only even numbered tokens (which eliminates the ASL and the BCS instructions).
INTERPRET
.1 JSR .2
JMP .1
.2 JSR GET
TAX
JMP (TABLE,X)
It's also possible to gain a little extra speed by inlining GET in INTERPRET rather than calling it via JSR.
NMOS 6502 code
On the NMOS 6502, GET can use (zp),Y addressing, like so:
GET
LDY #0
INC IP
BNE .1
INC IP+1
.1 LDA (IP),Y
RTS
Without JMP (abs,X) available, a common way to make a jump table is to use a pair of PHA instructions with an RTS. In this case, the address in the table must be one less than the address to jump to, since that is how RTS works. It's also worthwhile to split the table into two pieces so that the ASL and BCS instructions are not necessary. Also, the Y register is used here, since it was used in GET and that way the X register is unused.
(Assembly syntax note: > is used as the "hi byte of" operator)
INTERPRET
.1 JSR .2
JMP .1
.2 JSR GET
TAY
LDA JUMP_TABLE_HI,Y
PHA
LDA JUMP_TABLE_LO,Y
PHA
RTS
TABLE_HI
DB >TOKEN_00_HANDLER-1
DB >TOKEN_01_HANDLER-1
;etc.
DB >TOKEN_FE_HANDLER-1
DB >TOKEN_FF_HANDLER-1
TABLE_LO
DB TOKEN_00_HANDLER-1
DB TOKEN_01_HANDLER-1
;etc.
DB TOKEN_FE_HANDLER-1
DB TOKEN_FF_HANDLER-1
Self-modifying code
The 6502 implementation can be sped up by using self-modifying code. In this case, the jump table must be aligned on a page boundary. Note the similarity to the 65C02 code.
INTERPRET
.1 JSR .2
JMP .1
.2 JSR GET
ASL
BCS .4
STA .3+1
.3 JMP (JUMP_TABLE)
.4 STA .5+1
.5 JMP (JUMP_TABLE+256)
If 128 or fewer tokens are needed, then the same sort of optimization that was made for the 65C02 code can be made here also, again by using only even numbered tokens. However, the jump table must still be aligned on a page boundary.
INTERPRET
.1 JSR .2
JMP .1
.2 JSR GET
STA .3+1
.3 JMP (JUMP_TABLE)
To avoid using the Y register, GET can also use self-modifying code. Since both of the self-modifying versions of INTERPRET use only the accumulator, this has the benefit of using neither the X nor the Y register.
GET
INC IP+1
BNE .1
INC IP+2
.1
IP = *+1
LDA $FFFF
RTS