| | Rudi Account closed
Registered: May 2010 Posts: 125 |
Fast way to rotate a char?
Im not talking about rol or ror, but swap bits so that they are rotated 90 degrees:
Example:
a char (and the bits can be random):
10110010 byte 1..
11010110 byte 2.. etc..
00111001
01010110
11011010
10110101
00110011
10110100 after "rotation" (rows and columns are swapped):
11001101
01011000
10100111
11111111
00101000
01010101
11011010
00100110 is it possible to use lookup tables for this or would that lookup table be too big?
or other lookuptable for getting and setting bits?
-Rudi |
|
... 108 posts hidden. Click here to view all posts.... |
| | Rudi Account closed
Registered: May 2010 Posts: 125 |
nothing new here i guess.. |
| | Krill
Registered: Apr 2002 Posts: 2839 |
Quoting Rudinothing new here i guess.. Almost 3 years later...
Pretty much what White Flame had in mind, i guess:
Quoting White FlameHere's another idea which I think fails, but might be salvageable.
So the basic LSR, ROR accumulation grabs a bit and stores a bit, taking 2 instructions per bit. However, if done in-place just like many basic shift-add multiplication routines, ROR both sets a final bit as well as reads the next bit to place. This means that in dream land, the flip can be done in about 64 RORs. If fully in zp, that would be 320 cycles, but only 128 bytes of code with no tables.
However, the tactic I took doesn't seem to have a nice clean loop of RORs linking source bit locations to final bit locations. Various CMP #80s and other byte-masking & merging seems to be required, which would likely bloat it back up to 400+ cycles. But maybe by shuffling it around differently, an arrangement could be made that's both fast and short. So (everything in zeropage): ; from:
; row7: a7 a6 a5 a4 a3 a2 a1 a0
; row6: b7 b6 b5 b4 b3 b2 b1 b0
; row5: c7 c6 c5 c4 c3 c2 c1 c0
; row4: d7 d6 d5 d4 d3 d2 d1 d0
; row3: e7 e6 e5 e4 e3 e2 e1 e0
; row2: f7 f6 f5 f4 f3 f2 f1 f0
; row1: g7 g6 g5 g4 g3 g2 g1 g0
; row0: h7 h6 h5 h4 h3 h2 h1 h0
; to:
dest7 .byte 0; a7 b7 c7 d7 e7 f7 g7 h7
dest6 .byte 0; a6 b6 c6 d6 e6 f6 g6 h6
dest5 .byte 0; a5 b5 c5 d5 e5 f5 g5 h5
dest4 .byte 0; a4 b4 c4 d4 e4 f4 g4 h4
dest3 .byte 0; a3 b3 c3 d3 e3 f3 g3 h3
dest2 .byte 0; a2 b2 c2 d2 e2 f2 g2 h2
dest1 .byte 0; a1 b1 c1 d1 e1 f1 g1 h1
dest0 .byte 0; a0 b0 c0 d0 e0 f0 g0 h0
transpose
; cc c bits
row7 = * + 1
lda #0 ; 2, a7 a6 a5 a4 a3 a2 a1 a0
asl ; 2, a7 <- a6 a5 a4 a3 a2 a1 a0 00
rol ; 2, a6 <- a5 a4 a3 a2 a1 a0 00 a7
rol row6 ; 5, b7 <- b6 b5 b4 b3 b2 b1 b0 a6
rol ; 2, a5 <- a4 a3 a2 a1 a0 00 a7 b7
rol row5 ; 5, c7 <- c6 c5 c4 c3 c2 c1 c0 a5
rol ; 2, a4 <- a3 a2 a1 a0 00 a7 b7 c7
rol row4 ; 5, d7 <- d6 d5 d4 d3 d2 d1 d0 a4
rol ; 2, a3 <- a2 a1 a0 00 a7 b7 c7 d7
rol row3 ; 5, e7 <- e6 e5 e4 e3 e2 e1 e0 a3
rol ; 2, a2 <- a1 a0 00 a7 b7 c7 d7 e7
rol row2 ; 5, f7 <- f6 f5 f4 f3 f2 f1 f0 a2
rol ; 2, a1 <- a0 00 a7 b7 c7 d7 e7 f7
rol row1 ; 5, g7 <- g6 g5 g4 g3 g2 g1 g0 a1
rol ; 2, a0 <- 00 a7 b7 c7 d7 e7 f7 g7
rol row0 ; 5, h7 <- h6 h5 h4 h3 h2 h1 h0 a0
rol ; 2, 00 <- a7 b7 c7 d7 e7 f7 g7 h7
sta dest7; 3, a7 b7 c7 d7 e7 f7 g7 h7
; = 58
;clc
row6 = * + 1
lda #0 ; 2, b6 b5 b4 b3 b2 b1 b0 a6
and #$fe ; 2, b6 b5 b4 b3 b2 b1 b0 00
adc row6 ; 3, b6 <- b5 b4 b3 b2 b1 b0 00 a6
rol ; 2, b5 <- b4 b3 b2 b1 b0 00 a6 b6
rol row5 ; 5, c6 <- c5 c4 c3 c2 c1 c0 a5 b5
rol ; 2, b4 <- b3 b2 b1 b0 00 a6 b6 c6
rol row4 ; 5, d6 <- d5 d4 d3 d2 d1 d0 a4 b4
rol ; 2, b3 <- b2 b1 b0 00 a6 b6 c6 d6
rol row3 ; 5, e6 <- e5 e4 e3 e2 e1 e0 a3 b3
rol ; 2, b2 <- b1 b0 00 a6 b6 c6 d6 e6
rol row2 ; 5, f6 <- f5 f4 f3 f2 f1 f0 a2 b2
rol ; 2, b1 <- b0 00 a6 b6 c6 d6 e6 f6
rol row1 ; 5, g6 <- g5 g4 g3 g2 g1 g0 a1 b1
rol ; 2, b0 <- 00 a6 b6 c6 d6 e6 f6 g6
rol row0 ; 5, h6 <- h5 h4 h3 h2 h1 h0 a0 b0
rol ; 2, 00 <- a6 b6 c6 d6 e6 f6 g6 h6
sta dest6; 3, a6 b6 c6 d6 e6 f6 g6 h6
; = 54
;clc
row5 = * + 1
lda #0 ; 2, c5 c4 c3 c2 c1 c0 a5 b5
and #$fc ; 2, c5 c4 c3 c2 c1 c0 00 00
adc row5 ; 3, c5 <- c4 c3 c2 c1 c0 00 a5 b5
rol ; 2, c4 <- c3 c2 c1 c0 00 a5 b5 c5
rol row4 ; 5, d5 <- d4 d3 d2 d1 d0 a4 b4 c4
rol ; 2, c3 <- c2 c1 c0 00 a5 b5 c5 d5
rol row3 ; 5, e5 <- e4 e3 e2 e1 e0 a3 b3 c3
rol ; 2, c2 <- c1 c0 00 a5 b5 c5 d5 e5
rol row2 ; 5, f5 <- f4 f3 f2 f1 f0 a2 b2 c2
rol ; 2, c1 <- c0 00 a5 b5 c5 d5 e5 f5
rol row1 ; 5, g5 <- g4 g3 g2 g1 g0 a1 b1 c1
rol ; 2, c0 <- 00 a5 b5 c5 d5 e5 f5 g5
rol row0 ; 5, h5 <- h4 h3 h2 h1 h0 a0 b0 c0
rol ; 2, 00 <- a5 b5 c5 d5 e5 f5 g5 h5
sta dest5; 3, a5 b5 c5 d5 e5 f5 g5 h5
; = 47
;clc
row4 = * + 1
lda #0 ; 2, d4 d3 d2 d1 d0 a4 b4 c4
and #$f8 ; 2, d4 d3 d2 d1 d0 00 00 00
adc row4 ; 3, d4 <- d3 d2 d1 d0 00 a4 b4 c4
rol ; 2, d3 <- d2 d1 d0 00 a4 b4 c4 d4
rol row3 ; 5, e4 <- e3 e2 e1 e0 a3 b3 c3 d3
rol ; 2, d2 <- d1 d0 00 a4 b4 c4 d4 e4
rol row2 ; 5, f4 <- f3 f2 f1 f0 a2 b2 c2 d2
rol ; 2, d1 <- d0 00 a4 b4 c4 d4 e4 f4
rol row1 ; 5, g4 <- g3 g2 g1 g0 a1 b1 c1 d1
rol ; 2, d0 <- 00 a4 b4 c4 d4 e4 f4 g4
rol row0 ; 5, h4 <- h3 h2 h1 h0 a0 b0 c0 d0
rol ; 2, 00 <- a4 b4 c4 d4 e4 f4 g4 h4
sta dest4; 3, a4 b4 c4 d4 e4 f4 g4 h4
; = 40
;clc
row3 = * + 1
lda #0 ; 2, e3 e2 e1 e0 a3 b3 c3 d3
and #$f0 ; 2, e3 e2 e1 e0 00 00 00 00
adc row3 ; 3, e3 <- e2 e1 e0 00 a3 b3 c3 d3
rol ; 2, e2 <- e1 e0 00 a3 b3 c3 d3 e3
rol row2 ; 5, f3 <- f2 f1 f0 a2 b2 c2 d2 e2
rol ; 2, e1 <- e0 00 a3 b3 c3 d3 e3 f3
rol row1 ; 5, g3 <- g2 g1 g0 a1 b1 c1 d1 e1
rol ; 2, e0 <- 00 a3 b3 c3 d3 e3 f3 g3
rol row0 ; 5, h3 <- h2 h1 h0 a0 b0 c0 d0 e0
rol ; 2, 00 <- a3 b3 c3 d3 e3 f3 g3 h3
sta dest3; 3, a3 b3 c3 d3 e3 f3 g3 h3
; = 33
row2 = * + 1
lda #0 ; 2, f2 f1 f0 a2 b2 c2 d2 e2
asl ; 2, f2 <- f1 f0 a2 b2 c2 d2 e2 00
adc #$80 ; 2, f1 <- ?? f0 a2 b2 c2 d2 e2 f2
rol row1 ; 5, g2 <- g1 g0 a1 b1 c1 d1 e1 f1
rol ; 2, ?? <- f0 a2 b2 c2 d2 e2 f2 g2
cmp #$80 ; 2, f0 <- f0 a2 b2 c2 d2 e2 f2 g2
rol row0 ; 5, h2 <- h1 h0 a0 b0 c0 d0 e0 f0
rol ; 2, f0 <- a2 b2 c2 d2 e2 f2 g2 h2
sta dest2; 3, a2 b2 c2 d2 e2 f2 g2 h2
; = 25
row1 = * + 1
lda #0 ; 2, g1 g0 a1 b1 c1 d1 e1 f1
asl ; 2, g1 <- g0 a1 b1 c1 d1 e1 f1 00
adc #$80 ; 2, g0 <- ?? a1 b1 c1 d1 e1 f1 g1
rol row0 ; 5, h1 <- h0 a0 b0 c0 d0 e0 f0 g0
rol ; 2, ?? <- a1 b1 c1 d1 e1 f1 g1 h1
sta dest1; 3, a1 b1 c1 d1 e1 f1 g1 h1
; = 16
row0 = * + 1
lda #0 ; 2, h0 a0 b0 c0 d0 e0 f0 g0
cmp #$80 ; 2, h0 <- h0 a0 b0 c0 d0 e0 f0 g0
rol ; 2, h0 <- a0 b0 c0 d0 e0 f0 g0 h0
sta dest0; 3, a0 b0 c0 d0 e0 f0 g0 h0
; = 9
rts ; = 9 + 16 + 25 + 33 + 40 + 47 + 54 + 58 = 282 Thus...
Quoting ChristopherJamProblem with bitshifter is it only deals with one bit at a time.
c2p exchanges four bitpairs in only 20-30 cycles. The expensive part is moving bits to matching positions within the two bytes they're being exchanged between... Turns out both approaches are pretty much in the same ballpark. =)
(And this approach apparently coming out just ever so slightly faster. \=D/)
Might be possible to squeeze out a few more cycles here and there, so feel free.
I have a hunch that the general problem can't be solved in fewer than 280-ish cycles, though. |
| | JackAsser
Registered: Jun 2002 Posts: 1989 |
@Krill: Nice!!!
So, now what about the shortest code to rotate a char? (or a reference to the message number if it's already stated here) |
| | Krill
Registered: Apr 2002 Posts: 2839 |
Jackasser: Shortest? Going fully academic, eh? =D
I guess it would be the naïve approach, in a nested loop (untested):SOURCE = $02
DEST = $0a
ldy #7 ; 2
- ldx #7 ; 2
- lsr SOURCE,x; 2
ror ; 1
dex ; 1
bpl - ; 2
sax DEST,y ; 2
dey ; 1
bpl -- ; 2 15 bytes.
But i'd rather have a faster than 280-ish cycles approach. =) |
| | Oswald
Registered: Apr 2002 Posts: 5017 |
sax for giggles ? |
| | Krill
Registered: Apr 2002 Posts: 2839 |
SAX because STA zp,Y does not exist. |
| | Oswald
Registered: Apr 2002 Posts: 5017 |
*clapping* |
Previous - 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 - Next | |