Log inRegister an accountBrowse CSDbHelp & documentationFacts & StatisticsThe forumsAvailable RSS-feeds on CSDbSupport CSDb Commodore 64 Scene Database
 Welcome to our latest new user maak ! (Registered 2024-04-18) You are not logged in - nap
CSDb User Forums


Forums > C64 Coding > Fast way to rotate a char?
2017-01-04 08:32
Rudi
Account closed

Registered: May 2010
Posts: 125
Fast way to rotate a char?

Im not talking about rol or ror, but swap bits so that they are rotated 90 degrees:

Example:

a char (and the bits can be random):
10110010 byte 1..
11010110 byte 2.. etc..
00111001
01010110
11011010
10110101
00110011
10110100
after "rotation" (rows and columns are swapped):
11001101
01011000
10100111
11111111
00101000
01010101
11011010
00100110
is it possible to use lookup tables for this or would that lookup table be too big?
or other lookuptable for getting and setting bits?

-Rudi
 
... 105 posts hidden. Click here to view all posts....
 
2017-01-08 13:37
Rudi
Account closed

Registered: May 2010
Posts: 125
Quote: Rudi, except of the fact that you used EOR instead of ORA and didnt use LAX for the first read on zp (would be possible if you swap x and y registers), this looks identical to the code I posted pretty early in this thread. Is there a special reason to use EOR?

The eor was a consequence of the formulas i used for masking and swapping (I derived this from Kalms tutor):

Example for the 4x4 swapping:
tmp0 = byte0 & 0xf0; //xxxx----
tmp1 = byte1 & 0xf0; //xxxx----
tmp2 = byte2 & 0xf0; //xxxx----
tmp3 = byte3 & 0xf0; //xxxx----
tmp4 = byte4 & 0x0f; //----xxxx
tmp5 = byte5 & 0x0f; //----xxxx
tmp6 = byte6 & 0x0f; //----xxxx
tmp7 = byte7 & 0x0f; //----xxxx
data0 = byte0 << 4;
data1 = byte1 << 4;
data2 = byte2 << 4;
data3 = byte3 << 4;
data4 = byte4 >> 4;
data5 = byte5 >> 4;
data6 = byte6 >> 4;
data7 = byte7 >> 4;
data0 ^= tmp4;
data1 ^= tmp5;
data2 ^= tmp6;
data3 ^= tmp7;
data4 ^= tmp0;
data5 ^= tmp1;
data6 ^= tmp2;
data7 ^= tmp3;
Sorry for the long code..

EOR is used for the last xor-swapping. Since I cannot use lookup-table for two different values (fex. data0 and tmp4). The last eight operations in the above are done with the EOR-instruction.

Some of the lookup-tables i derived are doing EOR, AND and SHIFTS at the same time:
shl2_eor_cc[i] = (i ^ (i & 0xcc)) << 2;
shr2_eor_33[i] = (i ^ (i & 0x33)) >> 2;
shl1_eor_aa[i] = (i ^ (i & 0xaa)) << 1;
shr1_eor_55[i] = (i ^ (i & 0x55)) >> 1;
I scratched my head around how your ORA worked. And since I didnt understand that Dreamass-macrocode I wrote mine from scratch. But maybe I should look at your LAX-method next.

Edit: Now I see that it doesnt really matter if one use ora or eor for this technique.
2017-01-08 13:46
Axis/Oxyron

Registered: Apr 2007
Posts: 91
But you know that:
(i ^ (i & 0xcc))

is the same as:
i & 0x33

;o)
2017-01-08 14:45
Rudi
Account closed

Registered: May 2010
Posts: 125
Quote: But you know that:
(i ^ (i & 0xcc))

is the same as:
i & 0x33

;o)


No, didnt think about that hehe.

Btw, 312 cycles now (with LAX).
2017-01-08 14:54
Bitbreaker

Registered: Oct 2002
Posts: 499
Quoting Axis/Oxyron
But you know that:
(i ^ (i & 0xcc))

is the same as:
i & 0x33

;o)


smells like the version of the tab that shifts only 1 bit to the right could be substituted by some asr magic?
Also the and maskX looks like it could be included into something, too static to be done that often :-)
2017-01-08 15:22
Rudi
Account closed

Registered: May 2010
Posts: 125
XAA might be something too.
2017-01-08 18:29
Rudi
Account closed

Registered: May 2010
Posts: 125
Here's a different approach to it:
ldx $82			;3	
xaa #$33		;2	a=(x & 0x33)
ldy $80			;3
eor shl2_eor_cc, y	;4*
sta $90			;3
lda shr2_eor_33, x	;4*
eor tab_cc, y		;4*
sta $92			;3
uses the same amount of cycles though.

Bitbreaker: yes, one could probably optimize the 1x1 rotator with other illegal-opcodes. sine some of them do one shift.
2017-01-09 07:35
Bitbreaker

Registered: Oct 2002
Posts: 499
Besides that it will produce rubbish as xaa can add some unpredictable value to A before doing the txa and and part :-)
2017-01-09 10:10
Rastah Bar

Registered: Oct 2012
Posts: 336
Quoting Color Bar
I may have found a method that takes 432 cycles....


If I merge columns of 2 bits wide and 4 bits high into one byte and then extract the destination nybbles I can reduce that to 354 cycles.
2017-01-09 11:52
Rudi
Account closed

Registered: May 2010
Posts: 125
Quote: Quoting Color Bar
I may have found a method that takes 432 cycles....


If I merge columns of 2 bits wide and 4 bits high into one byte and then extract the destination nybbles I can reduce that to 354 cycles.


Are you using the masking method?
2017-01-09 12:25
Axis/Oxyron

Registered: Apr 2007
Posts: 91
I just want to share some thoughts on my merges that didnt work out. Perhaps I´m just missing the last twist.

First idea was to make relative merges. I discussed that back in the 90´s with some Amiga coders and on 68030-68060 it saves some cycles.
Idea is, that shifting of the input must not always have the exact values, as long as the delta of the shift of the 2 inputs stays correct. Disadvantage of that is, that the last merge needs to make some rol/ror to compensate.

This resulted in something like this:

lda {src1}
ldy {src2}
and #$aa
ldx {bittab1},y
sax {dst1}
eor {src1} ;invert and #$aa to and #$55
ldx {bittab2},y
sax {dst2}

Unluckily it only saves 1 cycle per merge which is completely eaten up by the last merge that looses 2 cycles for the correction.

Another idea was to interleave the temp-arrays with hi-byte pointers so that they can be used both as pointers for indirect y-indexing and as direct values. Code would look like this:

lax {src1}
and #{mask1}
ora ({src2}),y
sta {dst1}
lda ({src2}),y
ora {bittab2},x
sta {dst2}

Would also save 1 cycle per merge. But the unsolved problem is, that the 2 usages of {src2} should be pointing to 2 different tables. *grrr*

What definitely works is reordering the merges, so that the last 2 merges of a resolution dont need to store the tmp-values into the zp and the first two of the next resolution doesnt need to read the tmp-values.

so the last
sta {dst2}
and the first
lax {src1}
would merge into
tax.

Saves 4 times 3=12 cycles.
Previous - 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 - Next
RefreshSubscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search   for   in  
All times are CET.
Search CSDb
Advanced
Users Online
St0rmfr0nt/Quantum
Higgie/Kraze/Onslaught
Alakran_64
CA$H/TRiAD
psych
JackAsser/Booze Design
fox/bonzai
wil
tlr
Guests online: 79
Top Demos
1 Next Level  (9.8)
2 Mojo  (9.7)
3 Coma Light 13  (9.7)
4 Edge of Disgrace  (9.6)
5 Comaland 100%  (9.6)
6 No Bounds  (9.6)
7 Uncensored  (9.6)
8 Wonderland XIV  (9.6)
9 The Ghost  (9.6)
10 Bromance  (9.6)
Top onefile Demos
1 It's More Fun to Com..  (9.8)
2 Party Elk 2  (9.7)
3 Cubic Dream  (9.6)
4 Copper Booze  (9.5)
5 Rainbow Connection  (9.5)
6 Wafer Demo  (9.5)
7 TRSAC, Gabber & Pebe..  (9.5)
8 Onscreen 5k  (9.5)
9 Dawnfall V1.1  (9.5)
10 Quadrants  (9.5)
Top Groups
1 Oxyron  (9.3)
2 Nostalgia  (9.3)
3 Booze Design  (9.3)
4 Censor Design  (9.3)
5 Crest  (9.3)
Top Organizers
1 Burglar  (9.9)
2 Sixx  (9.8)
3 hedning  (9.7)
4 Irata  (9.7)
5 MWS  (9.6)

Home - Disclaimer
Copyright © No Name 2001-2024
Page generated in: 0.036 sec.