| |
Rudi Account closed
Registered: May 2010 Posts: 125 |
Fast way to rotate a char?
Im not talking about rol or ror, but swap bits so that they are rotated 90 degrees:
Example:
a char (and the bits can be random):
10110010 byte 1..
11010110 byte 2.. etc..
00111001
01010110
11011010
10110101
00110011
10110100 after "rotation" (rows and columns are swapped):
11001101
01011000
10100111
11111111
00101000
01010101
11011010
00100110 is it possible to use lookup tables for this or would that lookup table be too big?
or other lookuptable for getting and setting bits?
-Rudi |
|
... 105 posts hidden. Click here to view all posts.... |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Quoting Axis/OxyronChristopher: That 3x EOR thing is exactly what we did on Amiga back in the days. Didnt expect this to have an advantage on 6502. But where is the shifting taking place? Or is this only in 1 of the 3 passes, and the other pathes correct the bitorder with a table lookup?
Sweet. Yes, the code above is only used in the second of the three passes; first and third are very similar to yours, only with a bit shuffle on one of the input bytes in each pair on the third pass. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Quoting RudiIf anyone can get below 299 cycles then pls explain what method you use :p
Could we make the rules the same as for Axis' contribution before we compare cycle counts?
-neither input nor output bytes on zero page
-zero page intermediate results are fine
-code is not relocated to zero page either.
In practice it's unlikely there'd be enough space in ZP for source and destination charsets, and copying the data in and out would add at least an extra 102 cycles per char. |
| |
Rastah Bar Account closed
Registered: Oct 2012 Posts: 336 |
Good idea. I am down to 327 cycles now according to these rules. |
| |
Rudi Account closed
Registered: May 2010 Posts: 125 |
No, I dont think thats a good idea. But feel free to restrict yourselves to your own rules. I totally dont know what you are talking about anyway... |
| |
Rastah Bar Account closed
Registered: Oct 2012 Posts: 336 |
Quote: No, I dont think thats a good idea. But feel free to restrict yourselves to your own rules. I totally dont know what you are talking about anyway...
OK, I will keep mentioning both cases (all input and output bytes either in ZP or in Mem).
One version uses 301 cycles if all are in ZP, another version takes 327 cycles when all are in Mem. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
OK, in that case mine is 292 cycles if source and dest are in ZP, 308 cycles when all in mem.
Here's a diagram of the how the bits are shuffled at each stage:
Each block shows two input bytes for a swap macro in the top half, two outputs in the lower half. Digits in left border of each box are indexes into the input/output arrays for that stage.
Note that each input pair for the last four blocks contain one byte that's a shuffled version of an output byte from the previous stage, marked with a *
Shuffle's so I can then do an Axis style swap (which needs half the bits to be correctly located within the byte), and it's performed by storing the result of the previous stage into the low byte of an LDY absolute that references the shuffle table.
+-------------------------+ +-------------------------+ +-------------------------+ +-------------------------+
0 a0 b0 c0 d0 e0 f0 g0 h0 | 1 a1 b1 c1 d1 e1 f1 g1 h1 | 2 a2 b2 c2 d2 e2 f2 g2 h2 | 3 a3 b3 c3 d3 e3 f3 g3 h3 |
4 a4 b4 c4 d4 e4 f4 g4 h4 | 5 a5 b5 c5 d5 e5 f5 g5 h5 | 6 a6 b6 c6 d6 e6 f6 g6 h6 | 7 a7 b7 c7 d7 e7 f7 g7 h7 |
+-------------------------+ +-------------------------+ +-------------------------+ +-------------------------+
0 a0 b0 c0 d0 a4 b4 c4 d4 | 1 c1 d1 a1 b1 c5 d5 a5 b5 | 2 c2 d2 a2 b2 c6 d6 a6 b6 | 3 a3 b3 c3 d3 a7 b7 c7 d7 |
4 e0 f0 g0 h0 e4 f4 g4 h4 | 5 g1 h1 e1 f1 g5 h5 e5 f5 | 6 g2 h2 e2 f2 g6 h6 e6 f6 | 7 e3 f3 g3 h3 e7 f7 g7 h7 |
+-------------------------+ +-------------------------+ +-------------------------+ +-------------------------+
+-------------------------+ +-------------------------+ +-------------------------+ +-------------------------+
0 a0 b0 c0 d0 a4 b4 c4 d4 | 1 c1 d1 a1 b1 c5 d5 a5 b5 | 4 e0 f0 g0 h0 e4 f4 g4 h4 | 5 g1 h1 e1 f1 g5 h5 e5 f5 |
2 c2 d2 a2 b2 c6 d6 a6 b6 | 3 a3 b3 c3 d3 a7 b7 c7 d7 | 6 g2 h2 e2 f2 g6 h6 e6 f6 | 7 e3 f3 g3 h3 e7 f7 g7 h7 |
+-------------------------+ +-------------------------+ +-------------------------+ +-------------------------+
0 a0 b0 a2 b2 a4 b4 a6 b6 | 1 c1 d1 c3 d3 c5 d5 c7 d7 | 4 e0 f0 e2 f2 e4 f4 e6 f6 | 5 g1 h1 g3 h3 g5 h5 g7 h7 |
2 c2 d2 c0 d0 c6 d6 c4 d4 | 3 a3 b3 a1 b1 a7 b7 a5 b5 | 6 g2 h2 g0 h0 g6 h6 g4 h4 | 7 e3 f3 e1 f1 e7 f7 e5 f5 |
+-------------------------+ +-------------------------+ +-------------------------+ +-------------------------+
+-------------------------+ +-------------------------+ +-------------------------+ +-------------------------+
0 a0 b0 a2 b2 a4 b4 a6 b6 | 1 c1 d1 c3 d3 c5 d5 c7 d7 | 4 e0 f0 e2 f2 e4 f4 e6 f6 | 6*g0 h0 g2 h2 g4 h4 g6 h6 |
3*a1 b1 a3 b3 a5 b5 a7 b7 | 2*c0 d0 c2 d2 c4 d4 c6 d6 | 7*e1 f1 e3 f3 e5 f5 e7 f7 | 5 g1 h1 g3 h3 g5 h5 g7 h7 |
+-------------------------+ +-------------------------+ +-------------------------+ +-------------------------+
0 a0 a1 a2 a3 a4 a5 a6 a7 | 3 d0 d1 d2 d3 d4 d5 d6 d7 | 4 e0 e1 e2 e3 e4 e5 e6 e7 | 6 g0 g1 g2 g3 g4 g5 g6 g7 |
1 b0 b1 b2 b3 b4 b5 b6 b7 | 2 c0 c1 c2 c3 c4 c5 c6 c7 | 5 f0 f1 f2 f3 f4 f5 f6 f7 | 7 h0 h1 h2 h3 h4 h5 h6 h7 |
+-------------------------+ +-------------------------+ +-------------------------+ +-------------------------+
|
| |
Rastah Bar Account closed
Registered: Oct 2012 Posts: 336 |
Quoting ChristopherJam OK, in that case mine is 292 cycles if source and dest are in ZP, 308 cycles when all in mem.
Quote:
Ver neat!
Quote:
+-------------------------+ +-------------------------+ +-------------------------+ +-------------------------+
0 a0 b0 c0 d0 e0 f0 g0 h0 | 1 a1 b1 c1 d1 e1 f1 g1 h1 | 2 a2 b2 c2 d2 e2 f2 g2 h2 | 3 a3 b3 c3 d3 e3 f3 g3 h3 |
4 a4 b4 c4 d4 e4 f4 g4 h4 | 5 a5 b5 c5 d5 e5 f5 g5 h5 | 6 a6 b6 c6 d6 e6 f6 g6 h6 | 7 a7 b7 c7 d7 e7 f7 g7 h7 |
+-------------------------+ +-------------------------+ +-------------------------+ +-------------------------+
0 a0 b0 c0 d0 a4 b4 c4 d4 | 1 c1 d1 a1 b1 c5 d5 a5 b5 | 2 c2 d2 a2 b2 c6 d6 a6 b6 | 3 a3 b3 c3 d3 a7 b7 c7 d7 |
4 e0 f0 g0 h0 e4 f4 g4 h4 | 5 g1 h1 e1 f1 g5 h5 e5 f5 | 6 g2 h2 e2 f2 g6 h6 e6 f6 | 7 e3 f3 g3 h3 e7 f7 g7 h7 |
+-------------------------+ +-------------------------+ +-------------------------+ +-------------------------+
I would implement the code for the second and third byte pairs like this
lax s1
lda shuffle1,x
and #$f0
ldy s2
ora merge1,y
sta tmp1
tya
lda shuffle2,y
and #$0f
ora merge2,x
sta tmp2
------------------+
36 cycles (s1 and s2 in mem, tmp1 and tmp2 in zp)
For the first and 4th byte pairs the "LDA shuffle"s can be omitted (28 cycles).
Is there a more efficient way to do it? |
| |
Oswald
Registered: Apr 2002 Posts: 5094 |
why shuffle tabs dont include the ANDs? |
| |
Rastah Bar Account closed
Registered: Oct 2012 Posts: 336 |
Yes, you are right, the tables should include the ANDs. So that makes it 32 cycles for the 2nd and 3rd byte pairs. |
| |
Axis/Oxyron Account closed
Registered: Apr 2007 Posts: 91 |
Quoting Color Bar
tya
lda shuffle2,y
WOOT? ;o) |
Previous - 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 - Next |