MVN and MVP are very useful instructions, but they have a few limitations. First, they cannot span banks (i.e. they wrap at bank boundaries). Second, the source and destination banks are immediate data, meaning that if the instruction is in ROM, then the source and destination banks must be known at assemble (or compile) time. Third, they can only move at most 64k bytes of data. However, moving more than 64k bytes of data should be generally be avoided if at all possible, because it is so time consuming.
The routines below address these limitations.
Without self-modifying code
FROM and TO must be on the direct page, but SIZEL and SIZEH do not need to be on the direct page, nor do they need to be consecutive memory locations.
When SIZE is zero, no bytes are moved.
- Input:
- FROM = source start address
- TO = destination start address
- SIZE = number of bytes to move
MOVEDOWN PHP
SEP #$10
LDY #$00
LDX SIZEH
REP #$30
BEQ MD2
MD1 LDA [FROM],Y
STA [TO],Y
INY
INY
BNE MD1
SEP #$20
INC FROM+2
INC TO+2
REP #$20
DEX
BNE MD1
MD2 LDA SIZEL
LSR
BEQ MD4
TAX
MD3 LDA [FROM],Y
STA [TO],Y
INY
INY
DEX
BNE MD3
MD4 BCC MD5
SEP #$20
LDA [FROM],Y
STA [TO],Y
MD5 PLP
- Input:
- FROM = source end address
- TO = destination end address
- SIZE = number of bytes to move
MOVEUP PHP
SEP #$10
LDY #$00
LDX SIZEH
REP #$30
BEQ MU3
MU1 SEP #$20
DEC FROM+2
DEC TO+2
REP #$20
MU2 DEY
LDA [FROM],Y
STA [TO],Y
DEY
BNE MU2
DEX
BNE MU1
MU3 LDA SIZEL
LSR
BEQ MU5
TAX
SEP #$20
DEC FROM+2
DEC TO+2
REP #$20
MU4 DEY
LDA [FROM],Y
STA [TO],Y
DEY
DEX
BNE MU4
MU5 BCC MU6
SEP #$20
LDA [FROM],Y
STA [TO],Y
MU6 PLP
With self-modifying code
These routines are faster than the non-self-modifying routines (MVN or MVP are only 7 cycles per byte, faster per byte than the loops using [zp],Y addressing), but they use self-modifying code. However, only one instruction is modified (MVN or MVP), so only that instruction needs to be in RAM, and it can be called via a JSR or JSL without significantly affecting performance. Note that for moves of less than 64k bytes, MVN (or MVP) is executed 3 times at most, so a executing JSL-RTL pair 3 times is only 42 additional cycles (total, not per byte). For moves of more than 64k bytes the MVN instruction will be far more time consuming than the JSR-RTS (or JSL-RTL) pair.
FROML, FROMH, TOL, TOH, SIZEL and SIZEH do not need to be on the direct page, nor do they need to be consecutive memory locations.
Note that the SIZE parameter is slightly different than the non-self-modifying routines.
If SIZEL and SIZEH use direct page or long addressing rather than absolute addressing, then the PHB can be moved to the beginning of the routine and the PLB can be moved to the end of the routine. However, note that the performance improvement won't be all that significant, for the same reasons that calling MVN (or MVP) via JSR (or JSL) is not a significant performance penalty.
One byte (and one total cycle if SIZEH is zero) can be saved by replacing the SEP #$21 with SEP #$60 and replacing the first EOR #$80 SBC #$01 with:
;
BEQ SKIP
CLV
SKIP
MOVEDOWN works by setting A (the accumulator) so that MVN terminates at the bank boundary. After $10000-X bytes have been moved, X will be at the bank boundary, and after $10000-Y bytes have been moved, Y will be at the bank boundary. Since A is one less than the number of bytes to move, A must be the minimum of $FFFF-X, $FFFF-Y and SIZE (when neither will reach a bank boundary because there aren't that many bytes left to move). SIZE is then updated to reflect the number of bytes left to move. Note that when a bank boundary is reached X or Y or both will be zero; at that point, X and Y are each tested for zero to determine whether to increment the source bank, the destination bank, or both. Also note that when X is zero, $FFFF-X is $FFFF, so $FFFF-Y must be the minimum of $FFFF-X and $FFFF-Y, and when Y is zero, $FFFF-Y is $FFFF, so $FFFF-X must be the minimum of $FFFF-X and $FFFF-Y. (If both are zero, it doesn't matter which you choose at the minimum, since $FFFF-Y = $FFFF-X.)
- Input:
- FROM = source start address
- TO = destination start address
- SIZE = number of bytes to move - 1
MOVEDOWN PHP
SEP #$21 ; 8-bit accumulator, set carry
LDA FROMH
STA MD7+2
LDA TOH
STA MD7+1
LDA SIZEH
EOR #$80 ; set V if SIZEH is zero, clear V otherwise
SBC #$01
REP #$30
LDX FROML
LDY TOL
TYA
CMP FROML
BCC MD3 ; if Y < X then $FFFF-X < $FFFF-Y
BRA MD4
MD1 STA SIZEH
;
; Note: 8-bit accumulator, 16-bit index registers here
; Make sure the assembler assembles the correct width
;
EOR #$80 ; set V if SIZEH is zero, clear v otherwise
SBC #$01
CPX #$0000
BNE MD2 ; if X is not zero, then Y must be
INC MD7+2
REP #$20
TYA
BNE MD4
SEP #$20
MD2 INC MD7+1
REP #$20
MD3 TXA
MD4 EOR #$FFFF ; A XOR $FFFF = $FFFF - A
;
; If SIZEH is nonzero, SIZE can't be the minimum because
; SIZE is greater than $FFFF
;
BVC MD5 ; branch if SIZEH is nonzero
CMP SIZEL
BCC MD6
LDA SIZEL
MD5 CLC
MD6 PHA
PHB
MD7 MVN 0,0 ; this instruction is self-modified
PLB
PLA
EOR #$FFFF ; A XOR $FFFF = $FFFF - A = -1 - A
ADC SIZEL
STA SIZEL ; SIZEL = SIZEL - 1 - A
SEP #$20
LDA SIZEH ; update high byte of SIZE
SBC #$00
BCS MD1
PLP
MOVEUP works by setting A (the accumulator) so that MVP terminates at the bank boundary. After X+1 bytes have been moved X will be at the bank boundary, and after Y+1 bytes have been moved Y will be at the bank boundary. Since A is one less than the number of bytes to move, A must be the minimum of X, Y and SIZE (when neither will reach a bank boundary because there aren't that many bytes left to move). SIZE is then updated to reflect the number of bytes left to move. Note that when a bank boundary is reached X or Y or both will be $FFFF; at that point, X and Y are each tested for $FFFF to determine whether to decrement the source bank, the destination bank, or both. Also note that when X is $FFFF Y must be the minimum of X and Y, and when Y is $FFFF, X must be the minimum of X and Y. (If both are $FFFF, it doesn't matter which you choose at the minimum.)
- Input:
- FROM = source end address
- TO = destination end address
- SIZE = number of bytes to move - 1
MOVEUP PHP
SEP #$21 ; 8-bit accumulator, set carry
LDA FROMH
STA MU7+2
LDA TOH
STA MU7+1
LDA SIZEH
EOR #$80 ; set V if SIZEH is zero, clear V otherwise
SBC #$01
REP #$30
LDX FROML
LDY TOL
TYA
CMP FROML
BCS MU3
BRA MU4
MU1 STA SIZEH
;
; Note: 8-bit accumulator, 16-bit index registers here
; Make sure the assembler assembles the correct width
;
EOR #$80 ; set V if SIZE is zero, clear v otherwise
SBC #$01
CPX #$FFFF
BNE MU2 ; if X is not $FFFF, then Y must be
DEC MU7+2
REP #$20
TYA
CPY #$FFFF
BNE MU4
SEP #$20
MU2 DEC MU7+1
REP #$20
MU3 TXA
;
; If SIZEH is nonzero, SIZE can't be the minimum because
; SIZE is greater than $FFFF
;
MU4 BVC MU5 ; branch if SIZEH is nonzero
CMP SIZEL
BCC MU6
LDA SIZEL
MU5 CLC
MD6 PHA
PHB
MU7 MVP 0,0 ; this instruction is self-modified
PLB
PLA
EOR #$FFFF ; A XOR $FFFF = $FFFF - A = -1 - A
ADC SIZEL
STA SIZEL ; SIZEL = SIZEL - 1 - A
SEP #$20
LDA SIZEH ; update high byte of SIZE
SBC #$00
BCS MU1
PLP