[CSDb] - User Forums - Fast way to rotate a char?

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > Fast way to rotate a char?

2017-01-04 08:32

Rudi
Account closed

Registered: May 2010
Posts: 125

Fast way to rotate a char?

Im not talking about rol or ror, but swap bits so that they are rotated 90 degrees:

Example:

a char (and the bits can be random):

10110010 byte 1..
11010110 byte 2.. etc..
00111001
01010110
11011010
10110101
00110011
10110100

after "rotation" (rows and columns are swapped):

is it possible to use lookup tables for this or would that lookup table be too big?
or other lookuptable for getting and setting bits?

-Rudi

... 105 posts hidden. Click here to view all posts....

2017-01-09 15:18

White Flame

Registered: Sep 2002
Posts: 136

Here's another idea which I think fails, but might be salvageable.

So the basic LSR, ROR accumulation grabs a bit and stores a bit, taking 2 instructions per bit. However, if done in-place just like many basic shift-add multiplication routines, ROR both sets a final bit as well as reads the next bit to place. This means that in dream land, the flip can be done in about 64 RORs. If fully in zp, that would be 320 cycles, but only 128 bytes of code with no tables.

However, the tactic I took doesn't seem to have a nice clean loop of RORs linking source bit locations to final bit locations. Various CMP #80s and other byte-masking & merging seems to be required, which would likely bloat it back up to 400+ cycles. But maybe by shuffling it around differently, an arrangement could be made that's both fast and short.

2017-01-09 15:26

Rudi
Account closed

Registered: May 2010
Posts: 125

ok, ok. another idea is to use TXS and TSX (takes only 2 cycles each) for having X as temporary variable in optimizing. might require the lookup to switch from ,x and ,y to ,y and ,x however. I don't mind having a 4k+ big table for making it really fast. 32k+ table however thats overkill!

2017-01-09 15:44

Rastah Bar

Registered: Oct 2012
Posts: 336

Quote: ok, ok. another idea is to use TXS and TSX (takes only 2 cycles each) for having X as temporary variable in optimizing. might require the lookup to switch from ,x and ,y to ,y and ,x however. I don't mind having a 4k+ big table for making it really fast. 32k+ table however thats overkill!

I think my method needs 18 tables of 1 page = 4.5k. Using TXS/TSX will shave off 2 cycles in total.

2017-01-10 11:11

Rastah Bar

Registered: Oct 2012
Posts: 336

Quoting Color Bar

This has to be repeated for blocks 'b', 'c', 'd' (using the right lookup tables), but since source4 was saved in Y, that code can start with TAY instead of 'LDY source4' to save 1 cycle.

I meant that the 'LDY source4' can be omitted. The TAY doesn't make sense and should be omitted, so the code takes 342 cycli.

I can gain another 6 cycli by saving X. If the code

tax
lda moveHighNybbleToLowNybble,x

is replaced by

sta selfmod:+1
selfmod:
lda moveHighNybbleToLowNybble

then source7 is kept in X and the code for block 'b' can start with fetching the bits of source7 from a table and omit the 'LDX source7'. One cycle is gained that way. In that block source6 must then be saved in X as shown above, and in the following block source5. The code for the last block can stay as it was in the original post. That saves 3 cycles for half the matrix and 6 in total, and the algorithm takes 336 cycles.

2017-01-10 12:04

Oswald

Registered: Apr 2002
Posts: 5017

still awful lot of cycles. it seems, whatever the goal is, it is faster to construct the bits into final orientation right away.

2017-01-10 13:38

enthusi

Registered: May 2004
Posts: 675

I could still image some top-view char based game where rotating in 'realtime' might actually make sense.
But you would certainly need some very special case when this is favorable over replacing chars from a lookup.

2017-01-10 18:17

Rudi
Account closed

Registered: May 2010
Posts: 125

Still got 299 cycles after merging the 4x4 and 2x2 (195 cycles) plus 1x1 (104 cycles). Can't get it any lower than that at the moment. Im trying to look at a different method.

Anyway; Im wondering if the checkerboard pattern (AND-masking) approach with the shifts can be minimized.. Or changed to a more effective method. If it turns out that the checkerboard pattern is the most optimal then there's no other way than trying to optimize the actual assembly code.

2017-01-10 18:59

Rastah Bar

Registered: Oct 2012
Posts: 336

Pretty good. Below the magic 300 cycles barrier. :-)

2017-01-10 21:12

Copyfault

Registered: Dec 2001
Posts: 466

@Rudi: what do you mean with "merge 4x4 and 2x2"? I understand the 312-cycles version, though this is only valid if you stick to zp load/store. If you access absolute adresses (which will be necessary to read/write to charset data or bitmap resp.), this very same version ends up with 328 cycles -> identical to Axis' first routine.

By reordering the merges just like Axis pointed out this can be lowered to 328 - 12 = 316 (300 if you stick to zp load/store).

So how do you come down to 299 cycles???

2017-01-10 21:35

Rudi
Account closed

Registered: May 2010
Posts: 125

What I really meant was reordering merges, which I think Axis pointed on in one of his posts. Here I only managed to reorder the 2x2 rotation into 4x4 by removing some sta's and reusing the accumulator iirc. But unfortunately the code is so messy that I myself easily get tired after trying to optimize this. So, here comes a long list of code if you dont mind. I used tables for the shifts, and dot mind the eor-tag its the same as or. the next tag is the and value. Also used specific addresses in zp since this is just a test.

;-------------------------------
;4x4 & 2x2 rotation (195 cycles)
;-------------------------------
ldy $61
lax $65
and #$0f
eor shl4, y
sta $81
tya
and #$f0
eor shr4, x
sta $85

ldy $63
lda $67
and #$0f
eor shl4, y
ldy $81
tax
and #$33
eor shl2_eor_cc, y
sta $91
tya
and #$cc
eor shr2_eor_33, x
sta $93
lda $63
ldx $67
and #$f0
eor shr4, x
ldy $85
tax
and #$33
eor shl2_eor_cc, y
sta $95
tya
and #$cc
eor shr2_eor_33, x
sta $97

ldy $60
lax $64
and #$0f
eor shl4, y
sta $80
tya
and #$f0
eor shr4, x
sta $84


ldy $62
lax $66
and #$0f
eor shl4, y
ldy $80
and #$33
eor shl2_eor_cc, y
sta $90
tya
and #$cc
eor shr2_eor_33, x
sta $92
lda #$62
and #$f0
eor shr4, x
ldy $84
tax
and #$33
eor shl2_eor_cc, y
sta $94
tya
and #$cc
eor shr2_eor_33, x
sta $96

;-------------------------
;1x1 rotation (104 cycles)
;-------------------------
ldy $90
lax $91
and #$55
eor shl1_eor_aa, y
sta $70
tya
and #$aa
eor shr1_eor_55, x
sta $71

ldy $92
lax $93
and #$55
eor shl1_eor_aa, y
sta $72
tya
and #$aa 
eor shr1_eor_55, x
sta $73

ldy $94
lax $95
and #$55
eor shl1_eor_aa, y
sta $74
tya
and #$aa
eor shr1_eor_55, x
sta $75

ldy $96
lax $97
and #$55
eor shl1_eor_aa, y
sta $76
tya
and #$aa
eor shr1_eor_55, x
sta $77

I wanted to make the 4x4vs2x2 rotator shorter by using other method, but havent found a solution for this yet (the 1x1 rotator also, for that matter, by optimizing those two distinct sections).

The character is at $60-$67 and the result is at $70-$77.

Previous - 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 - Next

Refresh

Subscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search for in
All times are CET.

Search CSDb

Advanced

Users Online

zscs
pcollins/Quantum
Ghost/Quantum
Chico/Civitas
Didi/Laxity
Bieno/Commodore Plus
Guests online: 144

Top Demos

1 Next Level  (9.8)
2 Mojo  (9.7)
3 Coma Light 13  (9.7)
4 Edge of Disgrace  (9.6)
5 Comaland 100%  (9.6)
6 No Bounds  (9.6)
7 Uncensored  (9.6)
8 The Ghost  (9.6)
9 Wonderland XIV  (9.6)
10 Bromance  (9.6)

Top onefile Demos

1 It's More Fun to Com..  (9.8)
2 Party Elk 2  (9.7)
3 Cubic Dream  (9.6)
4 Copper Booze  (9.5)
5 Rainbow Connection  (9.5)
6 TRSAC, Gabber & Pebe..  (9.5)
7 Onscreen 5k  (9.5)
8 Wafer Demo  (9.5)
9 Dawnfall V1.1  (9.5)
10 Quadrants  (9.5)

Top Groups

1 Oxyron  (9.3)
2 Nostalgia  (9.3)
3 Booze Design  (9.3)
4 Censor Design  (9.3)
5 Crest  (9.3)

Top Crackers

1 Mr. Z  (9.9)
2 S!R  (9.9)
3 Mr Zero Page  (9.8)
4 Antitrack  (9.8)
5 OTD  (9.8)

Page generated in: 0.047 sec.