[CSDb] - User Forums - Compotime: Show me your (vector)balls

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > Compotime: Show me your (vector)balls

2013-05-24 11:28

Bitbreaker

Registered: Oct 2002
Posts: 510

Compotime: Show me your (vector)balls

After several comments arised that such an amiga-ball can be filled faster, i now want to call out a filler-compo for our coders.

Requirements:

The vector must be rendered in hires, background is white, foreground is dark red.

There's a raster-irq running that splits the screen at $2d and $f2 to set the background and border color to white and black, as seen in the screenshot. Means, there is a charline free in the bottom, that is where the benchmark results are displayed with the system charset. Displaying the result with screencodes is enough for us coders, but hex or decimal values are okay too.

The animation will be precalculated to see the power of your filler only. Therefore a data.bin is provided that contains all animationsteps for all faces with culling etc. already done.

The data structure may be altered to your needs, but not the animation itself, obvious isn't it?

The structure of data.bin is as follows:
byte x1 | $80
byte y1
byte x2
byte y2
byte x3
byte y3
byte x4 (optional, depending on if we have a triangle or quad)
byte y4 (optional, depending on if we have a triangle or quad)

As you can see faces can have 3 or 4 vertices, the first vertice is marked with bit 7 set, to be able to determine if a face consists of 3 or 4 vertices and to have a break out point for a finished frame, which is marked with the value $ff. If there's further questions about the data-format, don't hesitate to contact Bitbreaker

The filling must happen fullframe and fullsize, means, no interlacing or other cheap tricks with reducing resolution.

A counter for benchmarking must be implemented to count the frames until 256 frames have been displayed, it must be made visible in the bottom line.

The lowest value achieved counts (as there might be some jitter), for that, each entry must run in an endless loop.

The whole mem can be used, but every free byte of mem gives extra kudos.

Deadline is June 25th 0:00.

If the deadline is extended, a severe drama is expected, if not, you are out. Also i'll participate with an own entry, make a drama about it! :-)

Entries must be handed in to Bitbreaker and must not be released beforehand. They all will then released after the deadline, for maximum thrill and drama :-)

Each entry must be executeable with run.

SO DO YOU HAVE THE BALLS?

... 166 posts hidden. Click here to view all posts....

2013-06-26 12:12

Shadow
Account closed

Registered: Apr 2002
Posts: 355

Groepaz: Yeah, I noticed that on my PETSCII animated version as well - the animation is really not well suited for 50fps display, it just turns into a pink blur... Running VICE at 50% speed actually makes it look better!

2013-06-26 14:17

BYB

Registered: Jan 2011
Posts: 20

Where to download? I would like to see all the other versions too. Actually i only saw the petscii one, really nice work and idea. :) Ah, i found the competition entries up there :)

2013-06-26 14:29

Shadow
Account closed

Registered: Apr 2002
Posts: 355

Bitbreaker posted a link to the executables in the same post with the results (#134)

2013-06-26 18:09

HCL

Registered: Feb 2003
Posts: 728

Big congratulations to ChristopherJam who is the true winner of this compo, and also the only entry with correct double buffering!! No wonder you had memory problems if you managed to do that.

Omg, i ended up on last position :(. This definitely ends my era as a 1337 coder.. but i still claim that i once was.

..So here comes a few excuses. I wasted ~2 weeks on a huffman-packed animation of the line-buffer, which turned out alot slower than i expected. Well, at least i *tried* something else than an eor-filler, but later went back to implement something like Cruzer's hard-liner from 2004. I didn't really get any further than Cruzer did back then i suppose, or rather i didn't even get there probably :P. From the benchmarks it looks like Me+Cruzer+Axis implemented almost exactly the same thing.

I really would have wanted to go further from here, but there was no more time, and also my energy was starting to drain. I still have most of the zeropage unused, and the data.bin is untouched, but shifted 4 pixels like the rest of you also did. The last optimization that gained me some $18 frames or so was to unroll the vertex read-loop for one face, and thus also duplicating the line-init (via macro) to operate on various vertex combos. Gained more than i expected.

Ok, time to check out the other entries to see if there is something interesting.. Should be for a lamer like me :P.

2013-06-26 20:16

Cruzer

Registered: Dec 2001
Posts: 1049

Gongrats to Birbreaker and ChristopherJam, truly impressed. Guess it's back to the drawing board. Great to see that the filled vector standard has reached a new level compared to the 1992 style that rules for many years.

2013-06-26 20:27

HCL

Registered: Feb 2003
Posts: 728

Cruzer's implementation looks really clean and almost unoptimized. I bet you could gain some easy frames here and there..

Axis's ball is crapped on places.. I call for a ~$10 frames penalty ;). Well, it's probably just an easy bug fix, but it looks sloppy.. 3 bytes per vertex was kinda innovative, don't you think?

CJam, WTF? Looped clear + eor-fill, and still you beat us with margin. Ok, some zp-code there, and lots of precalced data, but i'm still like WTF?! Gotta learn the lesson :P.

Bitbreaker's ball was fast, ok, but you have also had two compos to optimize it ;). Besides i still think that span-filling is slower if you realtime-calc the vectors, hmm. that requires a proof i suppose :P ..and you *are* having double buffers, so it would be a piece of cake to do it bug-free then!?

I don't know if this applies to some of your lines, but i draw the lines backwards.. Then i don't need to find an address to put the RTS and then restore it. The line always finishes on the right place with RTS, and i just have to find out where to start. WTF, i did the worst result, i shouldn't come with tips and trix.. i should try to learn instead, it's just hard to change roles :P.

2013-06-27 06:36

Bitbreaker

Registered: Oct 2002
Posts: 510

Quoting HCL

Bitbreaker's ball was fast, ok, but you have also had two compos to optimize it ;). Besides i still think that span-filling is slower if you realtime-calc the vectors, hmm. that requires a proof i suppose :P ..and you *are* having double buffers, so it would be a piece of cake to do it bug-free then!?

What you mean with bug free? without tearing? In fact the tearing is not too heavy on the real machine. I might vsync but loose a few frames by that. Redoing the clearing to make it happen linewise and not columnwise would help, to make the synching happening within a tighter range.
As for realtime calced vectors i'd need ~$33e for just the filling and calculations coming along with that, if i remember correctly.
Also i had to do many things on that filler from scratch or can we take that as a hook for some serious drahma please? :-P

2013-06-27 07:28

PopMilo

Registered: Mar 2004
Posts: 146

Thanks!

Thank you Bitbreaker for making this compo, thank you all who showed how to code this thingy...

My take on this - we need more of these small, focused competitions!

2013-06-27 07:33

ChristopherJam

Registered: Aug 2004
Posts: 1423

An eorfiller with double buffers can be tear free without losing any frames by only changing $d016 in the IRQ, but any plotter that spends most of its time writing the the final bitmap requires triple buffers for the same effect - and if you've unrolled code dedicated to each buffer, that eats memory fast.

@HCL, the looped clear+fill costs me an extra few thousand cycles per frame, but it frees up memory for more unrolled speedcode fragments, which saves me more time than I lose to the loops.

As it is I'd hoped to spend an evening tuning how much memory I allocate to each of the clear, fill, and edge routines to optimise cycles, but like you I misread the deadline!

The speedcode uses a mix of inlined and JSRs for incrementing the column pointer, with inlines on the more commonly used cases towards the ends of the routines, but JSRs to save RAM on the less used cases. Again, the distribution's probably not all that optimal.

I draw lines left to right or right to left depending whether the slope is up or down; this way I only have positive slopes in my slope table.

2013-06-27 18:10

chatGPZ

Registered: Dec 2001
Posts: 11507

darn. i find myself working on the damned packer now, because i think a both faster AND shorter version (than the ones shown) is within reach. damn urge to code useless crap it is =)