[CSDb] - User Forums - Fast way to rotate a char?

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > Fast way to rotate a char?

2017-01-04 08:32

Rudi
Account closed

Registered: May 2010
Posts: 125

Fast way to rotate a char?

Im not talking about rol or ror, but swap bits so that they are rotated 90 degrees:

Example:

a char (and the bits can be random):

10110010 byte 1..
11010110 byte 2.. etc..
00111001
01010110
11011010
10110101
00110011
10110100

after "rotation" (rows and columns are swapped):

is it possible to use lookup tables for this or would that lookup table be too big?
or other lookuptable for getting and setting bits?

-Rudi

... 105 posts hidden. Click here to view all posts....

2017-01-11 10:39

Rastah Bar

Registered: Oct 2012
Posts: 336

I'm now at 326 cycli.

2017-01-11 13:52

Rudi
Account closed

Registered: May 2010
Posts: 125

Looking at the 4x4 rotation an recipe that I have made look like this:

1. rol 4 times higher 4 bytes.
2. swap lower nybbles of byte0->byte4, byte1->byte5 etc.
3. rol 4 times lower 4 bytes.

Done.

1 and 3 can be done with lookuptables. 2 is a more tricky.
So, what no.2 need is a fast way to swap lo-nybbles of two bytes, but it seems to be difficult. One would have to do that in 14 cycles or so. Impossible.

Someone gave me this xor-swap algorithm which takes 27 cycles:

LDA byte1
AND #$0f
EOR byte2
STA byte2
AND #$0f
EOR byte1
STA byte1
AND #$0f
EOR byte2
STA byte2

I also made this, but it takes one cycle more than the former:

LDX byte1
LDY byte2
LDA lowCleared,x
ORA andTab,y
STA byte1
LDA lowCleared,y
ORA andTab,x
STA byte2

I guess this wont help at all. Because 27*4 = 108 cycles. Allready reached the limit from the 312 version where each rotator-section take 104 cycles.

2017-01-11 14:24

Rastah Bar

Registered: Oct 2012
Posts: 336

Quoting Rudi

Someone gave me this xor-swap algorithm which takes 27 cycles:
LDA byte1 AND #$0f EOR byte2 STA byte2 AND #$0f EOR byte1 STA byte1 AND #$0f EOR byte2 STA byte2

26 cycles:

lax byte1
and #$f0
ldy byte2
ora grabLowNybble,y  ;This table performs AND #$0f
sta byte1  ;Now low nybble of byte2 is in byte1
tya
and #$f0
ora grabLowNybble,x  ;byte1 was kept in X
sta byte2

2017-01-11 14:43

Oswald

Registered: Apr 2002
Posts: 5017

you just need a 64k table

lda byte1byte2
sta result

:)

2017-01-11 14:52

Rudi
Account closed

Registered: May 2010
Posts: 125

Colorbar: nice
Oswald: hah yeah, like thats gonna happen. :P

2017-01-11 16:52

Rastah Bar

Registered: Oct 2012
Posts: 336

Flip disk ...
Rotate monitor clockwise ...

2017-01-11 18:45

Oswald

Registered: Apr 2002
Posts: 5017

del.

2017-01-11 18:55

Rastah Bar

Registered: Oct 2012
Posts: 336

Quote: Flip disk ...
Rotate monitor clockwise ...

Girls They Want to Have Fun

2017-01-12 09:31

Rastah Bar

Registered: Oct 2012
Posts: 336

Quoting Rudi

2. swap lower nybbles of byte0->byte4, byte1->byte5 etc.

That helped. If all bits from block 'a' in my post above are put entirely in destination0, and the bits of the 4x2 block above that in destination1, nybbles have to be swapped at the end, but the code

sta selfmod:+1
and #$0f
sta destination0
selfmod:
lda moveHighNybbleToLowNybble,x
sta destination1

simplifies to just 'STA destination' and with this I can reduce the cycle count to 312, I think.

2017-01-13 09:05

ChristopherJam

Registered: Aug 2004
Posts: 1378

I'm not managing to get below 308 cycles (source and destination non zero page).

However, my second stage is only 23 cycles, less for when stage chaining is done. Perhaps this can help optimise one of the routines above?

#define swap1(s1,s2,m,d1,d2) \
    .(        :\
    lda s1    :\
    eor s2    :\
    sta ztb   :\
    and#m     :\
    eor s2    :\
    sta d1    :\
    eor ztb   :\
    sta d2    :\
    .)

(also - I could get down to 300 by moving the code to zero page, but then you'd have to jmp there and back, and couldn't unroll for multiple source/destinations - one char takes most of the page)