Indeed, but special care has to be taken after 256 potential overflows. That would of course imply a >16-bit result in the end.
We're all terrible at reading each other's comments.
.const zp0=$fb .const zp1=$fc .const zp2=$fd .pc=$1000 start: sei !: bit $d011 bpl !- // No BLs! lda zp1 sta b+1 lda zp2 sta c+1 lda #$00 sta $dd0f sta $dd06 lda #$01 sta $dd07 sta $dd0f ldy zp0 // 3 b: ldx tab,y // 4/5 c: ldy tab,x // 4/5 // Here we could add more ldx/ldy lo: ldx $dd06 // 4 => Wait=15/17 lda carrytab,x // HI/LO in A/Y sty $63 sta $62 cli jmp $bdd1 .align $0100 tab: .fill 512,i&$ff carrytab: .fill 256,[$f3-i]&$ff
The indexed addressing is doing the adds, and the timer is doing the carries, in parallel.