Software 65816 Block Fill
;
;  This routine fills a block with a value 
; 
;  Uses 65816 native mode 
;
;  Start with fill value in Accumulator
;  fills up to 65,536 locations
;
;  This example fills $00FA00 - $00FDE7 with the Value passed in A
;
;

          CLC               ; Clear carry flag to set Native Mode
          XCE               ; swap Carry and Emulation bits
          JSR  FILL         ; call fill routine
          ...

FILL      PHB               ; save DBR from calling program (MVN overwrites DBR)
          LDX  #$00         ; DBR changed to $00
          PHX               ; put on stack
          PLB               ; move to DBR
          STA  $FA00        ; set first byte of source to our fill value
          REP  #$30         ; set 16 bit Acc and Index for MVN instruction
          LDA  #$03E6       ; Acc = 1 minus # bytes to move, this example moves 999 bytes
          LDX  #$FA00       ; source pointer
          LDY  #$FA01       ; destination pointer
          MVN  $00,$00      ; Move source to dest 
          SEP  #$30         ; restore 8 bit Acc and Index
          PLB               ; restore DBR 
          RTS               ; return

This is a simple block fill. You can modify the code above to allow more flexibility, such as having different source and destinations upper address bytes as well as being able to pass all required addresses via the stack. In order to allow upper address block flexibility, you will need this routine in RAM so you can use self-modifying code to change the MVN parameters.

I'll provide an example soon.

Formulas and macros

It's convenient to let the assembler calculate the correct register values. LDA through MVN of the example above could then be written as:

STARTADR =   $FA00
ENDADR   =   $FDE7
BANK     =   $00

         LDX #STARTADR
         LDY #STARTADR+1
         LDA #ENDADR-STARTADR-1
         MVN BANK,BANK

Or ENDADR = STARTADR+999 can be used instead. This (along with a PHB preceding the MVN and a PLB following the MVN) makes a great macro. Also, the LDY can be replaced with TXY and INY (one fewer byte, but one more cycle).

Speed and space considerations

For memory fills where the address and length is known ahead of time (i.e. when assembling), note that a loop may be just as convenient as using MVN. For example, the loop in:

; Assume A and X are 16 bits wide
;
     LDA #$ABAB   ; fill byte is $AB
     LDX #$2000   ; 8k buffer
LOOP DEX          ; 2 cycles
     DEX          ; 2 cycles
     STA BUFFER,X ; 6 cycles using long,X addressing
     BNE LOOP     ; 3 cycles (when the branch is taken)

takes 13 cycles, but stores 2 bytes each time through the loop, so it takes 6.5 cycles per byte, and the code is 14 bytes long. (Note that STA does not affect the Z flag.) In comparison,

; Assume A, X, and Y are 16 bits wide and the low byte of A is $AB
;
STA BUFFER  ; long addressing
LDX #BUFFER
TXY
INY
LDA #$1FFE
MVN BANK,BANK

is 15 bytes long, which doesn't include the LDA or a PHB and PLB, and the MVN takes 7 cycles per byte.

More speed can be had by (partially) unrolling the loop. For example:

; Assume A and X are 16 bits wide
;
     LDA #$ABAB         ; fill byte is $AB
     LDX #$1000         ; 8k buffer
LOOP DEX                ; 2 cycles
     DEX                ; 2 cycles
     STA BUFFER,X       ; 6 cycles using long,X addressing
     STA BUFFER+$1000,X ; 6 cycles using long,X addressing
     BNE LOOP           ; 3 cycles (when the branch is taken)

is 18 bytes long, but the loop takes only 4.75 cycles per byte (19 cycles / 4 bytes).

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License