The following 6502 code has several entry points:
OUTBYTE PHA
LSR
LSR
LSR
LSR
JSR OUTHEXDIG
PLA
OUTNIBBLE AND #$0F
OUTHEXDIG CMP #$0A
BCC OUTDECDIG
ADC #$66
OUTDECDIG EOR #$30
JMP OUTCHAR
On the 65816, the OUTCHAR routine might be written so that it can be called with either value of the M flag, e.g.
OUTCHAR PHP ; save the M flag
SEP #$20 ; 8-bit accumulator
;
; output the character
;
PLP ; restore the M flag
RTL
It might seem like it is necessary to do this for all of the other entry points of the first routine, e.g.
OUTDECDIG PHP
SEP #$20
EOR #$30
JSL OUTCHAR
PLP
RTL
and so on. However, this is not necessary, and the following code can be used instead:
OUTWORD PHA
XBA
JSL OUTBYTE
PLA
OUTBYTE PHA
LSR
LSR
LSR
LSR
JSL OUTNIBBLE
PLA
OUTNIBBLE AND CONST000F
OUTHEXDIG CMP CONST000A
BCC OUTDECDIG
ADC CONST0066
OUTDECDIG EOR CONST0030
JMP OUTCHAR
CONST000A DW $000A
CONST000F DW $000F
CONST0030 DW $0030
CONST0066 DW $0066
When the m flag is 1 (8-bit accumulator), the AND CONST000F uses only the first byte of the DW assembler directive, which is $0F. Likewise for the CMP, ADC and EOR. Note that long (24-bit) addressing mode is available for all four instructions, so they need not depend on the value of the data bank register.
When the m flag is 0 (16-bit accumulator), the AND CONST000F uses both bytes of the DW assembler directive, i.e. the 16-bit value $000F. Likewise for the CMP, ADC and EOR instructions.
Thus, the result is code that is independent of the value of the m flag. This technique can also be used with the x flag.
Caveats
Bear in mind that there are some things that won't quite behave the same for both values of the m (or x) flag. For example:
CLC
LDA CONST7F
ADC CONST01
BVS LABEL
branches when m is 1 (8-bit accumulator), but not when m is 0 (16-bit accumulator)
CLC
LDA CONSTFF
ADC CONST01
BEQ LABEL
branches when m is 1 (8-bit accumulator), but not when m is 0 (16-bit accumulator)
LDA CONST80
ASL
BEQ LABEL
branches when m is 1 (8-bit accumulator), but not when m is 0 (16-bit accumulator)
LDA CONSTFF
BMI LABEL
branches when m is 1 (8-bit accumulator), but not when m is 0 (16-bit accumulator). However, bit 7 can be tested by using BIT rather than looking at the N flag. For example, the previous example can be written as:
LDA CONSTFF
BIT CONST80
BNE LABEL
which branches with either value of the m flag.
A good rule of thumb is that if involves a flag, it's worth double checking to make sure that it will work for both values of the m flag. There is a point at which it will be simpler to just go ahead and use a PHP followed by a SEP #$20 (or REP #$20) to force the m flag to a known value, then use a PLP when finished.
Optimizations
If OUTCHAR ignores the upper byte, and it's likely it will, then the upper byte of EOR CONST0030 does not need to be zero. So the EOR CONST0030 can be replaced with:
EOR #$30 ; assemble an 8-bit immediate value
NOP
If this code is executed when the m flag is 0 (16-bit accumulator), that two instruction sequence becomes the single instruction EOR #$EA30. This is smaller (since the DW $0030 is no longer needed) and faster. When m is 1 (8-bit accumulator), the immediate addressing takes 2 cycles and the NOP takes 2 cycles for a total of 4 cycles as compared to the EOR CONST0030 which takes 4 cycles for absolute addressing or 5 cycles for long addressing. When m is 0 (16-bit accumulator), the EOR #$EA30 takes 3 cycles as compared to the EOR CONST0030 which takes 5 cycles for absolute addressing or 6 cycles for long addressing.
Likewise the ADC CONST0066 can be replaced with:
ADC #$66 ; assemble an 8-bit immediate value
NOP ; upper byte for 16-bit immediate data
Note, however the high byte of the CMP CONST000A must be zero, and consequently the high byte of the AND CONST000F must be zero also.
On the other hand, if an ORA #$00 followed by a NOP is inserted at the label OUTHEXDIG (OUTHEXDIG must perform the ORA) before the CMP, then AND-NOP and CMP-NOP sequences can be used, which will be smaller (because it eliminates two DWs). It will be slower when the m flag is one (8-bit accumulator), but faster when the m flag is 0 (16-bit accumulator).
Generally speaking, it is likely that BIT, CMP, CPX, and CPY immediate will need the high byte to be zero (or at least be a known value), but other instructions are a candidate for the NOP optimization.
Another example where NOP can be used (again, assuming OUTCHAR ignores the high byte)
OUTSPACE LDA #$20 ; assemble an 8-bit immediate value
NOP ; upper byte for 16-bit immediate data
JMP OUTCHAR