[CSDb] - User Forums - Compotime: Show me your (vector)balls

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > Compotime: Show me your (vector)balls

2013-05-24 11:28

Bitbreaker

Registered: Oct 2002
Posts: 510

Compotime: Show me your (vector)balls

After several comments arised that such an amiga-ball can be filled faster, i now want to call out a filler-compo for our coders.

Requirements:

The vector must be rendered in hires, background is white, foreground is dark red.

There's a raster-irq running that splits the screen at $2d and $f2 to set the background and border color to white and black, as seen in the screenshot. Means, there is a charline free in the bottom, that is where the benchmark results are displayed with the system charset. Displaying the result with screencodes is enough for us coders, but hex or decimal values are okay too.

The animation will be precalculated to see the power of your filler only. Therefore a data.bin is provided that contains all animationsteps for all faces with culling etc. already done.

The data structure may be altered to your needs, but not the animation itself, obvious isn't it?

The structure of data.bin is as follows:
byte x1 | $80
byte y1
byte x2
byte y2
byte x3
byte y3
byte x4 (optional, depending on if we have a triangle or quad)
byte y4 (optional, depending on if we have a triangle or quad)

As you can see faces can have 3 or 4 vertices, the first vertice is marked with bit 7 set, to be able to determine if a face consists of 3 or 4 vertices and to have a break out point for a finished frame, which is marked with the value $ff. If there's further questions about the data-format, don't hesitate to contact Bitbreaker

The filling must happen fullframe and fullsize, means, no interlacing or other cheap tricks with reducing resolution.

A counter for benchmarking must be implemented to count the frames until 256 frames have been displayed, it must be made visible in the bottom line.

The lowest value achieved counts (as there might be some jitter), for that, each entry must run in an endless loop.

The whole mem can be used, but every free byte of mem gives extra kudos.

Deadline is June 25th 0:00.

If the deadline is extended, a severe drama is expected, if not, you are out. Also i'll participate with an own entry, make a drama about it! :-)

Entries must be handed in to Bitbreaker and must not be released beforehand. They all will then released after the deadline, for maximum thrill and drama :-)

Each entry must be executeable with run.

SO DO YOU HAVE THE BALLS?

... 166 posts hidden. Click here to view all posts....

2013-06-25 18:31

Bitbreaker

Registered: Oct 2002
Posts: 510

it is said that there is an entry from Metalvotze that is worth waiting for. So i'll upload all entries + results as soon as it's done.

2013-06-25 18:42

Bitbreaker

Registered: Oct 2002
Posts: 510

Quote: Haha.. quite funny this. Someone sets up a goal, a bunch of guys aims their weapons, and in the end, hit targets of a great variation more or less far from the goal :D.

They are getting old and loose eyesight! :-)

2013-06-26 06:21

Bitbreaker

Registered: Oct 2002
Posts: 510

Finally, there's the results, and thanks to Metalvotze, none of the serious competitors will be last :-)

So here come the results (executables):

place   handle          frames

god     bitbreaker      $24f
----------------------------
1.      christopherjam  $25f
2.      axis            $29c
3.      cruzer          $29e
4.      hcl             $2a4
5.      drago           $dead

As you see, i decided to be out of compo :-) Congrats to ChristopherJam pushing the eor-filler that hard to nearly reach my result and a big thanks to all that participated! Now it is time to discuss and boast in detail i guess? :-) Now show me your inballievable code!

2013-06-26 07:37

Axis/Oxyron
Account closed

Registered: Apr 2007
Posts: 91

Congrats to Christopher. Great work dude!

After taking a look into the code of all entries I have to say: "We are all bloody uncreative".
The code looks something like 90% identical (eor filler mostly unrolled and slopetable lines) with only small changes in the details like how the stored data is converted to slope spans.
I really hoped that someone comes up with a nice innovation. Like special code for flat lines gathering multiple pixels per store or some tricks to avoid the eor per linepixel.
I guess this has to wait until the next compo.

2013-06-26 08:48

Bitbreaker

Registered: Oct 2002
Posts: 510

So that is how it is done in my case:

All coordinates are shifted 4 pixels to the right, so that only 15 columns have to be treated (thanks to THCM for that hint) also all faces are resorted so that they are drawn from right to left, thus the left edge of all faces never will come into contact with edges from other faces, and ora'ing with already present content in the buffer can be omitted on left side.
The use of slope tables seem to be common practice, so not much to tell here. Funny enough one comes a long with rather small tables here, but the input data is bloated up by that process. To save bytes the format of data.bin was adopted quite a bit.
Stuff has been aligned to convenient places so that most of all the jump pointers can be easily calculated with 2 cycle illegal opcodes. Also a lot of code is squeezed into the zeropage and from $ff9c on, so that both code segments can be accessed via normal branches (address wraparound).

Here's the source have fun digging through it.

2013-06-26 08:56

ChristopherJam

Registered: Aug 2004
Posts: 1423

Thanks, guys!

And yes, I was hoping to get away from using an eorfiller too, but my span fill attempts either took up way too much ram with tables, or were too slow, despite presorting the polys from left to right so I never needed to mask the right hand sides.

All of my slopes are taken from a single 384 byte table, but the offsets into it are all precomputed and my memory usage is pretty dire. I'm damn impressed that bitbreaker managed to get the best time in only 133 blocks!

Each edge is stored as two bytes for the address to jump into an eorfill routine, the low byte of a base pointer into the eorbuffer, and an offset into the slope table.

The eorfill routines plot up to 32 pixels, each taking its Y value from basepointer+slope[x*4+offset]. They plot two pixels at a time if the slope is low enough.

I too shifted four pixels left and four pixels up, so I only use 15x15 chars.

If I knew I could have gotten away with a single charset for display, that would have saved me a few kb to unroll some of the loops further.

Very impressed with the cleanness of some of the other entries.

2013-06-26 09:48

Axis/Oxyron
Account closed

Registered: Apr 2007
Posts: 91

My implementation is a pretty straight forward eor filler with slopetable lines.

Completely unrolled speedcode for clear and fill that only accesses the bytes that get touched in at least one frame of the animation.
I shifted the coordinates -3 in x. My prototype reported the least amount of touched bytes in that position.
My address generation in the lineroutine is optimized down to 1 inc-zp every 16 pixels, because all the #$80 fiddling is encoded into my slopetables.
The rest is just classic code-optimization of the lineinit.
I just realized, I could have saved a lot with storing the line speedcode with multiple widths. So I dont need to patch and restore an rts into the linecode.

2013-06-26 10:16

ChristopherJam

Registered: Aug 2004
Posts: 1423

Sadly I only thought of including the $80 fiddling into the slope tables this morning, but I did have multiple speedcode fragments rather than doing RTS patching. All my fragments just JMP back into the main loop in zero page.

2013-06-26 11:39

Shadow
Account closed

Registered: Apr 2002
Posts: 355

Aw damnit, I missed the deadline with my joke entry.
Oh well, here it is anyway:

http://ag1976.com/tmp/amiga_petscii.prg

50 fps baby, oh yeah! :D

2013-06-26 11:58

chatGPZ

Registered: Dec 2001
Posts: 11507

so, as if anyone cares, ofcourse i tried anyway :)

in: 263168 bytes (128 file(s)) out: 69281 bytes (1 file(s)-128 frame(s)) left: 26.33%

so that'd *almost* fit, using plain delta+RLE on the bitmaps. using screen+charset properly might actually make it fit, but since my little packer doesnt do that automatically i couldnt be arsed to test =P

oh, and it runs so fast that it looks totally crap, making the whole animation attempt somewhat pointless =) was interesting to get some figures for code- and data size of both attempts though, cheers =)