[CSDb] - User Forums - Fast way to rotate a char?

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > Fast way to rotate a char?

2017-01-04 08:32

Rudi
Account closed

Registered: May 2010
Posts: 125

Fast way to rotate a char?

Im not talking about rol or ror, but swap bits so that they are rotated 90 degrees:

Example:

a char (and the bits can be random):

10110010 byte 1..
11010110 byte 2.. etc..
00111001
01010110
11011010
10110101
00110011
10110100

after "rotation" (rows and columns are swapped):

is it possible to use lookup tables for this or would that lookup table be too big?
or other lookuptable for getting and setting bits?

-Rudi

... 105 posts hidden. Click here to view all posts....

2017-01-08 14:45

Rudi
Account closed

Registered: May 2010
Posts: 125

Quote: But you know that:
(i ^ (i & 0xcc))

is the same as:
i & 0x33

;o)

No, didnt think about that hehe.

Btw, 312 cycles now (with LAX).

2017-01-08 14:54

Bitbreaker

Registered: Oct 2002
Posts: 508

Quoting Axis/Oxyron

But you know that:
(i ^ (i & 0xcc))

is the same as:
i & 0x33

;o)

smells like the version of the tab that shifts only 1 bit to the right could be substituted by some asr magic?
Also the and maskX looks like it could be included into something, too static to be done that often :-)

2017-01-08 15:22

Rudi
Account closed

Registered: May 2010
Posts: 125

XAA might be something too.

2017-01-08 18:29

Rudi
Account closed

Registered: May 2010
Posts: 125

Here's a different approach to it:

ldx $82			;3	
xaa #$33		;2	a=(x & 0x33)
ldy $80			;3
eor shl2_eor_cc, y	;4*
sta $90			;3
lda shr2_eor_33, x	;4*
eor tab_cc, y		;4*
sta $92			;3

uses the same amount of cycles though.

Bitbreaker: yes, one could probably optimize the 1x1 rotator with other illegal-opcodes. sine some of them do one shift.

2017-01-09 07:35

Bitbreaker

Registered: Oct 2002
Posts: 508

Besides that it will produce rubbish as xaa can add some unpredictable value to A before doing the txa and and part :-)

2017-01-09 10:10

Rastah Bar
Account closed

Registered: Oct 2012
Posts: 336

Quoting Color Bar

I may have found a method that takes 432 cycles....

If I merge columns of 2 bits wide and 4 bits high into one byte and then extract the destination nybbles I can reduce that to 354 cycles.

2017-01-09 11:52

Rudi
Account closed

Registered: May 2010
Posts: 125

Quote: Quoting Color Bar
I may have found a method that takes 432 cycles....

If I merge columns of 2 bits wide and 4 bits high into one byte and then extract the destination nybbles I can reduce that to 354 cycles.

Are you using the masking method?

2017-01-09 12:25

Axis/Oxyron
Account closed

Registered: Apr 2007
Posts: 91

I just want to share some thoughts on my merges that didnt work out. Perhaps I´m just missing the last twist.

First idea was to make relative merges. I discussed that back in the 90´s with some Amiga coders and on 68030-68060 it saves some cycles.
Idea is, that shifting of the input must not always have the exact values, as long as the delta of the shift of the 2 inputs stays correct. Disadvantage of that is, that the last merge needs to make some rol/ror to compensate.

This resulted in something like this:

lda {src1}
ldy {src2}
and #$aa
ldx {bittab1},y
sax {dst1}
eor {src1} ;invert and #$aa to and #$55
ldx {bittab2},y
sax {dst2}

Unluckily it only saves 1 cycle per merge which is completely eaten up by the last merge that looses 2 cycles for the correction.

Another idea was to interleave the temp-arrays with hi-byte pointers so that they can be used both as pointers for indirect y-indexing and as direct values. Code would look like this:

lax {src1}
and #{mask1}
ora ({src2}),y
sta {dst1}
lda ({src2}),y
ora {bittab2},x
sta {dst2}

Would also save 1 cycle per merge. But the unsolved problem is, that the 2 usages of {src2} should be pointing to 2 different tables. *grrr*

What definitely works is reordering the merges, so that the last 2 merges of a resolution dont need to store the tmp-values into the zp and the first two of the next resolution doesnt need to read the tmp-values.

so the last
sta {dst2}
and the first
lax {src1}
would merge into
tax.

Saves 4 times 3=12 cycles.

2017-01-09 14:27

Rudi
Account closed

Registered: May 2010
Posts: 125

Interesting ideas. Reordering merges was something I tried out but failed. Maybe I should look at it again.

2017-01-09 15:00

Rudi
Account closed

Registered: May 2010
Posts: 125

Ok, so I did merge 4x4 and 2x2, and it worked. Am now at 299 cycles. There is probably more that can be optimized, because now the code looks like a mess.

Previous - 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 - Next

Refresh

Subscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search for in
All times are CET.

Search CSDb

Advanced

Users Online

Dr.j/Delysid
Vent
wacek/arise
Guests online: 89

Top Demos

1 Next Level  (9.7)
2 13:37  (9.7)
3 Mojo  (9.7)
4 Coma Light 13  (9.6)
5 Edge of Disgrace  (9.6)
6 What Is The Matrix 2  (9.6)
7 The Demo Coder  (9.6)
8 Uncensored  (9.6)
9 Comaland 100%  (9.6)
10 Wonderland XIV  (9.6)

Top onefile Demos

1 Layers  (9.6)
2 No Listen  (9.6)
3 Cubic Dream  (9.6)
4 Party Elk 2  (9.6)
5 Copper Booze  (9.6)
6 Rainbow Connection  (9.5)
7 Dawnfall V1.1  (9.5)
8 Onscreen 5k  (9.5)
9 Morph  (9.5)
10 Libertongo  (9.5)

Top Groups

1 Performers  (9.3)
2 Booze Design  (9.3)
3 Oxyron  (9.3)
4 Triad  (9.3)
5 Censor Design  (9.3)

Top Swappers

1 Derbyshire Ram  (10)
2 Jerry  (9.8)
3 Violator  (9.7)
4 Acidchild  (9.7)
5 Cash  (9.6)

Page generated in: 0.038 sec.