Log inRegister an accountBrowse CSDbHelp & documentationFacts & StatisticsThe forumsAvailable RSS-feeds on CSDbSupport CSDb Commodore 64 Scene Database
 Welcome to our latest new user maak ! (Registered 2024-04-18) You are not logged in - nap
CSDb User Forums


Forums > C64 Coding > Fast way to rotate a char?
2017-01-04 08:32
Rudi
Account closed

Registered: May 2010
Posts: 125
Fast way to rotate a char?

Im not talking about rol or ror, but swap bits so that they are rotated 90 degrees:

Example:

a char (and the bits can be random):
10110010 byte 1..
11010110 byte 2.. etc..
00111001
01010110
11011010
10110101
00110011
10110100
after "rotation" (rows and columns are swapped):
11001101
01011000
10100111
11111111
00101000
01010101
11011010
00100110
is it possible to use lookup tables for this or would that lookup table be too big?
or other lookuptable for getting and setting bits?

-Rudi
 
... 105 posts hidden. Click here to view all posts....
 
2017-01-14 03:56
ChristopherJam

Registered: Aug 2004
Posts: 1370
OK, in that case mine is 292 cycles if source and dest are in ZP, 308 cycles when all in mem.

Here's a diagram of the how the bits are shuffled at each stage:
Each block shows two input bytes for a swap macro in the top half, two outputs in the lower half. Digits in left border of each box are indexes into the input/output arrays for that stage.
Note that each input pair for the last four blocks contain one byte that's a shuffled version of an output byte from the previous stage, marked with a *
Shuffle's so I can then do an Axis style swap (which needs half the bits to be correctly located within the byte), and it's performed by storing the result of the previous stage into the low byte of an LDY absolute that references the shuffle table.

 +-------------------------+  +-------------------------+  +-------------------------+  +-------------------------+
 0 a0 b0 c0 d0 e0 f0 g0 h0 |  1 a1 b1 c1 d1 e1 f1 g1 h1 |  2 a2 b2 c2 d2 e2 f2 g2 h2 |  3 a3 b3 c3 d3 e3 f3 g3 h3 |
 4 a4 b4 c4 d4 e4 f4 g4 h4 |  5 a5 b5 c5 d5 e5 f5 g5 h5 |  6 a6 b6 c6 d6 e6 f6 g6 h6 |  7 a7 b7 c7 d7 e7 f7 g7 h7 |
 +-------------------------+  +-------------------------+  +-------------------------+  +-------------------------+
 0 a0 b0 c0 d0 a4 b4 c4 d4 |  1 c1 d1 a1 b1 c5 d5 a5 b5 |  2 c2 d2 a2 b2 c6 d6 a6 b6 |  3 a3 b3 c3 d3 a7 b7 c7 d7 |
 4 e0 f0 g0 h0 e4 f4 g4 h4 |  5 g1 h1 e1 f1 g5 h5 e5 f5 |  6 g2 h2 e2 f2 g6 h6 e6 f6 |  7 e3 f3 g3 h3 e7 f7 g7 h7 |
 +-------------------------+  +-------------------------+  +-------------------------+  +-------------------------+
                                                             
                                                             
 +-------------------------+  +-------------------------+  +-------------------------+  +-------------------------+
 0 a0 b0 c0 d0 a4 b4 c4 d4 |  1 c1 d1 a1 b1 c5 d5 a5 b5 |  4 e0 f0 g0 h0 e4 f4 g4 h4 |  5 g1 h1 e1 f1 g5 h5 e5 f5 |
 2 c2 d2 a2 b2 c6 d6 a6 b6 |  3 a3 b3 c3 d3 a7 b7 c7 d7 |  6 g2 h2 e2 f2 g6 h6 e6 f6 |  7 e3 f3 g3 h3 e7 f7 g7 h7 |
 +-------------------------+  +-------------------------+  +-------------------------+  +-------------------------+
 0 a0 b0 a2 b2 a4 b4 a6 b6 |  1 c1 d1 c3 d3 c5 d5 c7 d7 |  4 e0 f0 e2 f2 e4 f4 e6 f6 |  5 g1 h1 g3 h3 g5 h5 g7 h7 |
 2 c2 d2 c0 d0 c6 d6 c4 d4 |  3 a3 b3 a1 b1 a7 b7 a5 b5 |  6 g2 h2 g0 h0 g6 h6 g4 h4 |  7 e3 f3 e1 f1 e7 f7 e5 f5 |
 +-------------------------+  +-------------------------+  +-------------------------+  +-------------------------+
                                                             
                                                             
 +-------------------------+  +-------------------------+  +-------------------------+  +-------------------------+
 0 a0 b0 a2 b2 a4 b4 a6 b6 |  1 c1 d1 c3 d3 c5 d5 c7 d7 |  4 e0 f0 e2 f2 e4 f4 e6 f6 |  6*g0 h0 g2 h2 g4 h4 g6 h6 |
 3*a1 b1 a3 b3 a5 b5 a7 b7 |  2*c0 d0 c2 d2 c4 d4 c6 d6 |  7*e1 f1 e3 f3 e5 f5 e7 f7 |  5 g1 h1 g3 h3 g5 h5 g7 h7 |
 +-------------------------+  +-------------------------+  +-------------------------+  +-------------------------+
 0 a0 a1 a2 a3 a4 a5 a6 a7 |  3 d0 d1 d2 d3 d4 d5 d6 d7 |  4 e0 e1 e2 e3 e4 e5 e6 e7 |  6 g0 g1 g2 g3 g4 g5 g6 g7 |
 1 b0 b1 b2 b3 b4 b5 b6 b7 |  2 c0 c1 c2 c3 c4 c5 c6 c7 |  5 f0 f1 f2 f3 f4 f5 f6 f7 |  7 h0 h1 h2 h3 h4 h5 h6 h7 |
 +-------------------------+  +-------------------------+  +-------------------------+  +-------------------------+


2017-01-15 10:29
Rastah Bar

Registered: Oct 2012
Posts: 336
Quoting ChristopherJam
OK, in that case mine is 292 cycles if source and dest are in ZP, 308 cycles when all in mem.
Quote:

Ver neat!
Quote:

 +-------------------------+  +-------------------------+  +-------------------------+  +-------------------------+
 0 a0 b0 c0 d0 e0 f0 g0 h0 |  1 a1 b1 c1 d1 e1 f1 g1 h1 |  2 a2 b2 c2 d2 e2 f2 g2 h2 |  3 a3 b3 c3 d3 e3 f3 g3 h3 |
 4 a4 b4 c4 d4 e4 f4 g4 h4 |  5 a5 b5 c5 d5 e5 f5 g5 h5 |  6 a6 b6 c6 d6 e6 f6 g6 h6 |  7 a7 b7 c7 d7 e7 f7 g7 h7 |
 +-------------------------+  +-------------------------+  +-------------------------+  +-------------------------+
 0 a0 b0 c0 d0 a4 b4 c4 d4 |  1 c1 d1 a1 b1 c5 d5 a5 b5 |  2 c2 d2 a2 b2 c6 d6 a6 b6 |  3 a3 b3 c3 d3 a7 b7 c7 d7 |
 4 e0 f0 g0 h0 e4 f4 g4 h4 |  5 g1 h1 e1 f1 g5 h5 e5 f5 |  6 g2 h2 e2 f2 g6 h6 e6 f6 |  7 e3 f3 g3 h3 e7 f7 g7 h7 |
 +-------------------------+  +-------------------------+  +-------------------------+  +-------------------------+

I would implement the code for the second and third byte pairs like this
lax s1
lda shuffle1,x
and #$f0
ldy s2
ora merge1,y
sta tmp1
tya
lda shuffle2,y
and #$0f
ora merge2,x
sta tmp2
------------------+
36 cycles (s1 and s2 in mem, tmp1 and tmp2 in zp)

For the first and 4th byte pairs the "LDA shuffle"s can be omitted (28 cycles).
Is there a more efficient way to do it?
2017-01-15 13:37
Oswald

Registered: Apr 2002
Posts: 5017
why shuffle tabs dont include the ANDs?
2017-01-15 17:29
Rastah Bar

Registered: Oct 2012
Posts: 336
Yes, you are right, the tables should include the ANDs. So that makes it 32 cycles for the 2nd and 3rd byte pairs.
2017-01-16 08:04
Axis/Oxyron

Registered: Apr 2007
Posts: 91
Quoting Color Bar

tya
lda shuffle2,y


WOOT? ;o)
2017-01-16 08:16
Pex Mahoney Tufvesson

Registered: Sep 2003
Posts: 50
> WOOT? ;o)

It's a new way of doing nothing; it's just a little too complicated for you Axis, so go back and play with your 300fps million-dots 6510 dot spheres. :P

---
Have a noise night!
http://mahoney.c64.org
2017-01-16 09:03
Rastah Bar

Registered: Oct 2012
Posts: 336
Quote: Quoting Color Bar

tya
lda shuffle2,y


WOOT? ;o)


Lol. I needed some sleep. Honestly, this stuff is keeping me awake at night.

Pex: Good that you contribute something to this thread too.
2017-01-16 11:30
ChristopherJam

Registered: Aug 2004
Posts: 1370
Haha, you guys.

Um, my second and third shuffles from the first phase is pretty brute force on the tables front:

    ldx s1
    ldy s2
    lda sb_t2,x
    ora sb_t3,y
    sta d1
    lda sb_t0,x
    ora sb_t1,y
    sta d2


30 cycles with mem source, zp destination..
2017-02-01 21:22
Rudi
Account closed

Registered: May 2010
Posts: 125
nothing new here i guess..
2019-11-11 22:22
Krill

Registered: Apr 2002
Posts: 2821
Quoting Rudi
nothing new here i guess..
Almost 3 years later...

Pretty much what White Flame had in mind, i guess:
Quoting White Flame
Here's another idea which I think fails, but might be salvageable.

So the basic LSR, ROR accumulation grabs a bit and stores a bit, taking 2 instructions per bit. However, if done in-place just like many basic shift-add multiplication routines, ROR both sets a final bit as well as reads the next bit to place. This means that in dream land, the flip can be done in about 64 RORs. If fully in zp, that would be 320 cycles, but only 128 bytes of code with no tables.

However, the tactic I took doesn't seem to have a nice clean loop of RORs linking source bit locations to final bit locations. Various CMP #80s and other byte-masking & merging seems to be required, which would likely bloat it back up to 400+ cycles. But maybe by shuffling it around differently, an arrangement could be made that's both fast and short.
So (everything in zeropage):
               ; from:
         ; row7: a7 a6 a5 a4 a3 a2 a1 a0
         ; row6: b7 b6 b5 b4 b3 b2 b1 b0
         ; row5: c7 c6 c5 c4 c3 c2 c1 c0
         ; row4: d7 d6 d5 d4 d3 d2 d1 d0
         ; row3: e7 e6 e5 e4 e3 e2 e1 e0
         ; row2: f7 f6 f5 f4 f3 f2 f1 f0
         ; row1: g7 g6 g5 g4 g3 g2 g1 g0
         ; row0: h7 h6 h5 h4 h3 h2 h1 h0

               ; to:
dest7   .byte 0; a7 b7 c7 d7 e7 f7 g7 h7
dest6   .byte 0; a6 b6 c6 d6 e6 f6 g6 h6
dest5   .byte 0; a5 b5 c5 d5 e5 f5 g5 h5
dest4   .byte 0; a4 b4 c4 d4 e4 f4 g4 h4
dest3   .byte 0; a3 b3 c3 d3 e3 f3 g3 h3
dest2   .byte 0; a2 b2 c2 d2 e2 f2 g2 h2
dest1   .byte 0; a1 b1 c1 d1 e1 f1 g1 h1
dest0   .byte 0; a0 b0 c0 d0 e0 f0 g0 h0

transpose
                 ; cc  c    bits
row7 = * + 1
        lda #0   ; 2,       a7 a6 a5 a4 a3 a2 a1 a0
        asl      ; 2, a7 <- a6 a5 a4 a3 a2 a1 a0 00
        rol      ; 2, a6 <- a5 a4 a3 a2 a1 a0 00 a7
        rol row6 ; 5, b7 <- b6 b5 b4 b3 b2 b1 b0 a6
        rol      ; 2, a5 <- a4 a3 a2 a1 a0 00 a7 b7
        rol row5 ; 5, c7 <- c6 c5 c4 c3 c2 c1 c0 a5
        rol      ; 2, a4 <- a3 a2 a1 a0 00 a7 b7 c7
        rol row4 ; 5, d7 <- d6 d5 d4 d3 d2 d1 d0 a4
        rol      ; 2, a3 <- a2 a1 a0 00 a7 b7 c7 d7
        rol row3 ; 5, e7 <- e6 e5 e4 e3 e2 e1 e0 a3
        rol      ; 2, a2 <- a1 a0 00 a7 b7 c7 d7 e7
        rol row2 ; 5, f7 <- f6 f5 f4 f3 f2 f1 f0 a2
        rol      ; 2, a1 <- a0 00 a7 b7 c7 d7 e7 f7
        rol row1 ; 5, g7 <- g6 g5 g4 g3 g2 g1 g0 a1
        rol      ; 2, a0 <- 00 a7 b7 c7 d7 e7 f7 g7
        rol row0 ; 5, h7 <- h6 h5 h4 h3 h2 h1 h0 a0
        rol      ; 2, 00 <- a7 b7 c7 d7 e7 f7 g7 h7
        sta dest7; 3,       a7 b7 c7 d7 e7 f7 g7 h7
              ; = 58
       ;clc
row6 = * + 1
        lda #0   ; 2,       b6 b5 b4 b3 b2 b1 b0 a6
        and #$fe ; 2,       b6 b5 b4 b3 b2 b1 b0 00
        adc row6 ; 3, b6 <- b5 b4 b3 b2 b1 b0 00 a6
        rol      ; 2, b5 <- b4 b3 b2 b1 b0 00 a6 b6
        rol row5 ; 5, c6 <- c5 c4 c3 c2 c1 c0 a5 b5
        rol      ; 2, b4 <- b3 b2 b1 b0 00 a6 b6 c6
        rol row4 ; 5, d6 <- d5 d4 d3 d2 d1 d0 a4 b4
        rol      ; 2, b3 <- b2 b1 b0 00 a6 b6 c6 d6
        rol row3 ; 5, e6 <- e5 e4 e3 e2 e1 e0 a3 b3
        rol      ; 2, b2 <- b1 b0 00 a6 b6 c6 d6 e6
        rol row2 ; 5, f6 <- f5 f4 f3 f2 f1 f0 a2 b2
        rol      ; 2, b1 <- b0 00 a6 b6 c6 d6 e6 f6
        rol row1 ; 5, g6 <- g5 g4 g3 g2 g1 g0 a1 b1
        rol      ; 2, b0 <- 00 a6 b6 c6 d6 e6 f6 g6
        rol row0 ; 5, h6 <- h5 h4 h3 h2 h1 h0 a0 b0
        rol      ; 2, 00 <- a6 b6 c6 d6 e6 f6 g6 h6
        sta dest6; 3,       a6 b6 c6 d6 e6 f6 g6 h6
              ; = 54
       ;clc
row5 = * + 1
        lda #0   ; 2,       c5 c4 c3 c2 c1 c0 a5 b5
        and #$fc ; 2,       c5 c4 c3 c2 c1 c0 00 00
        adc row5 ; 3, c5 <- c4 c3 c2 c1 c0 00 a5 b5
        rol      ; 2, c4 <- c3 c2 c1 c0 00 a5 b5 c5
        rol row4 ; 5, d5 <- d4 d3 d2 d1 d0 a4 b4 c4
        rol      ; 2, c3 <- c2 c1 c0 00 a5 b5 c5 d5
        rol row3 ; 5, e5 <- e4 e3 e2 e1 e0 a3 b3 c3
        rol      ; 2, c2 <- c1 c0 00 a5 b5 c5 d5 e5
        rol row2 ; 5, f5 <- f4 f3 f2 f1 f0 a2 b2 c2
        rol      ; 2, c1 <- c0 00 a5 b5 c5 d5 e5 f5
        rol row1 ; 5, g5 <- g4 g3 g2 g1 g0 a1 b1 c1
        rol      ; 2, c0 <- 00 a5 b5 c5 d5 e5 f5 g5
        rol row0 ; 5, h5 <- h4 h3 h2 h1 h0 a0 b0 c0
        rol      ; 2, 00 <- a5 b5 c5 d5 e5 f5 g5 h5
        sta dest5; 3,       a5 b5 c5 d5 e5 f5 g5 h5
              ; = 47
       ;clc
row4 = * + 1
        lda #0   ; 2,       d4 d3 d2 d1 d0 a4 b4 c4
        and #$f8 ; 2,       d4 d3 d2 d1 d0 00 00 00
        adc row4 ; 3, d4 <- d3 d2 d1 d0 00 a4 b4 c4
        rol      ; 2, d3 <- d2 d1 d0 00 a4 b4 c4 d4
        rol row3 ; 5, e4 <- e3 e2 e1 e0 a3 b3 c3 d3
        rol      ; 2, d2 <- d1 d0 00 a4 b4 c4 d4 e4
        rol row2 ; 5, f4 <- f3 f2 f1 f0 a2 b2 c2 d2
        rol      ; 2, d1 <- d0 00 a4 b4 c4 d4 e4 f4
        rol row1 ; 5, g4 <- g3 g2 g1 g0 a1 b1 c1 d1
        rol      ; 2, d0 <- 00 a4 b4 c4 d4 e4 f4 g4
        rol row0 ; 5, h4 <- h3 h2 h1 h0 a0 b0 c0 d0
        rol      ; 2, 00 <- a4 b4 c4 d4 e4 f4 g4 h4
        sta dest4; 3,       a4 b4 c4 d4 e4 f4 g4 h4
              ; = 40
       ;clc
row3 = * + 1
        lda #0   ; 2,       e3 e2 e1 e0 a3 b3 c3 d3
        and #$f0 ; 2,       e3 e2 e1 e0 00 00 00 00
        adc row3 ; 3, e3 <- e2 e1 e0 00 a3 b3 c3 d3
        rol      ; 2, e2 <- e1 e0 00 a3 b3 c3 d3 e3
        rol row2 ; 5, f3 <- f2 f1 f0 a2 b2 c2 d2 e2
        rol      ; 2, e1 <- e0 00 a3 b3 c3 d3 e3 f3
        rol row1 ; 5, g3 <- g2 g1 g0 a1 b1 c1 d1 e1
        rol      ; 2, e0 <- 00 a3 b3 c3 d3 e3 f3 g3
        rol row0 ; 5, h3 <- h2 h1 h0 a0 b0 c0 d0 e0
        rol      ; 2, 00 <- a3 b3 c3 d3 e3 f3 g3 h3
        sta dest3; 3,       a3 b3 c3 d3 e3 f3 g3 h3
              ; = 33
row2 = * + 1
        lda #0   ; 2,       f2 f1 f0 a2 b2 c2 d2 e2
        asl      ; 2, f2 <- f1 f0 a2 b2 c2 d2 e2 00
        adc #$80 ; 2, f1 <- ?? f0 a2 b2 c2 d2 e2 f2
        rol row1 ; 5, g2 <- g1 g0 a1 b1 c1 d1 e1 f1
        rol      ; 2, ?? <- f0 a2 b2 c2 d2 e2 f2 g2
        cmp #$80 ; 2, f0 <- f0 a2 b2 c2 d2 e2 f2 g2
        rol row0 ; 5, h2 <- h1 h0 a0 b0 c0 d0 e0 f0
        rol      ; 2, f0 <- a2 b2 c2 d2 e2 f2 g2 h2
        sta dest2; 3,       a2 b2 c2 d2 e2 f2 g2 h2
              ; = 25
row1 = * + 1
        lda #0   ; 2,       g1 g0 a1 b1 c1 d1 e1 f1
        asl      ; 2, g1 <- g0 a1 b1 c1 d1 e1 f1 00
        adc #$80 ; 2, g0 <- ?? a1 b1 c1 d1 e1 f1 g1
        rol row0 ; 5, h1 <- h0 a0 b0 c0 d0 e0 f0 g0
        rol      ; 2, ?? <- a1 b1 c1 d1 e1 f1 g1 h1
        sta dest1; 3,       a1 b1 c1 d1 e1 f1 g1 h1
              ; = 16
row0 = * + 1
        lda #0   ; 2,       h0 a0 b0 c0 d0 e0 f0 g0
        cmp #$80 ; 2, h0 <- h0 a0 b0 c0 d0 e0 f0 g0
        rol      ; 2, h0 <- a0 b0 c0 d0 e0 f0 g0 h0
        sta dest0; 3,       a0 b0 c0 d0 e0 f0 g0 h0
               ; = 9
        rts    ; = 9 + 16 + 25 + 33 + 40 + 47 + 54 + 58 = 282
Thus...
Quoting ChristopherJam
Problem with bitshifter is it only deals with one bit at a time.

c2p exchanges four bitpairs in only 20-30 cycles. The expensive part is moving bits to matching positions within the two bytes they're being exchanged between...
Turns out both approaches are pretty much in the same ballpark. =)
(And this approach apparently coming out just ever so slightly faster. \=D/)

Might be possible to squeeze out a few more cycles here and there, so feel free.

I have a hunch that the general problem can't be solved in fewer than 280-ish cycles, though.
Previous - 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 - Next
RefreshSubscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search   for   in  
All times are CET.
Search CSDb
Advanced
Users Online
Fred/Channel 4
Icon/TRIAD
Freeze/Blazon
Didi/Laxity
maak
Krill/Plush
Guests online: 129
Top Demos
1 Next Level  (9.8)
2 Mojo  (9.7)
3 Coma Light 13  (9.7)
4 Edge of Disgrace  (9.6)
5 Comaland 100%  (9.6)
6 No Bounds  (9.6)
7 Uncensored  (9.6)
8 Wonderland XIV  (9.6)
9 The Ghost  (9.6)
10 Bromance  (9.6)
Top onefile Demos
1 It's More Fun to Com..  (9.9)
2 Party Elk 2  (9.7)
3 Cubic Dream  (9.6)
4 Copper Booze  (9.5)
5 Rainbow Connection  (9.5)
6 TRSAC, Gabber & Pebe..  (9.5)
7 Onscreen 5k  (9.5)
8 Dawnfall V1.1  (9.5)
9 Quadrants  (9.5)
10 Daah, Those Acid Pil..  (9.5)
Top Groups
1 Oxyron  (9.3)
2 Nostalgia  (9.3)
3 Booze Design  (9.3)
4 Censor Design  (9.3)
5 Crest  (9.3)
Top Coders
1 Axis  (9.8)
2 Graham  (9.8)
3 Lft  (9.8)
4 Crossbow  (9.8)
5 HCL  (9.8)

Home - Disclaimer
Copyright © No Name 2001-2024
Page generated in: 0.067 sec.