| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
Compotime: Show me your (vector)balls
After several comments arised that such an amiga-ball can be filled faster, i now want to call out a filler-compo for our coders.
Requirements:
The vector must be rendered in hires, background is white, foreground is dark red.
There's a raster-irq running that splits the screen at $2d and $f2 to set the background and border color to white and black, as seen in the screenshot. Means, there is a charline free in the bottom, that is where the benchmark results are displayed with the system charset. Displaying the result with screencodes is enough for us coders, but hex or decimal values are okay too.
The animation will be precalculated to see the power of your filler only. Therefore a data.bin is provided that contains all animationsteps for all faces with culling etc. already done.
The data structure may be altered to your needs, but not the animation itself, obvious isn't it?
The structure of data.bin is as follows:
byte x1 | $80
byte y1
byte x2
byte y2
byte x3
byte y3
byte x4 (optional, depending on if we have a triangle or quad)
byte y4 (optional, depending on if we have a triangle or quad)
As you can see faces can have 3 or 4 vertices, the first vertice is marked with bit 7 set, to be able to determine if a face consists of 3 or 4 vertices and to have a break out point for a finished frame, which is marked with the value $ff. If there's further questions about the data-format, don't hesitate to contact Bitbreaker
The filling must happen fullframe and fullsize, means, no interlacing or other cheap tricks with reducing resolution.
A counter for benchmarking must be implemented to count the frames until 256 frames have been displayed, it must be made visible in the bottom line.
The lowest value achieved counts (as there might be some jitter), for that, each entry must run in an endless loop.
The whole mem can be used, but every free byte of mem gives extra kudos.
Deadline is June 25th 0:00.
If the deadline is extended, a severe drama is expected, if not, you are out. Also i'll participate with an own entry, make a drama about it! :-)
Entries must be handed in to Bitbreaker and must not be released beforehand. They all will then released after the deadline, for maximum thrill and drama :-)
Each entry must be executeable with run.
SO DO YOU HAVE THE BALLS? |
|
| |
JackAsser
Registered: Jun 2002 Posts: 2014 |
<nevermind> just saw the $ff stop marker... :P |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
Nope. but feel free to sum them up and add that info, as stated, you are free to change that to your needs :-) |
| |
HCL
Registered: Feb 2003 Posts: 728 |
Haha, funny compo :). So, if i really made some good balls before (as vector bobs), that will not count.. I'll see if i can make a filler again.. no promises though :). |
| |
JackAsser
Registered: Jun 2002 Posts: 2014 |
Bitbreaker: Did a reference renderer in Java that "almost" work... Have you verified the data? Are you sure the first x-position of a quad can't be at x=127 (i.e. a valid $ff)?
Of course I coded it wrong... but I dunno why just yet. :P Anyway, when fixed I'll upload the renderer for everyone so that the code explains the data-format. |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
$7f does not occur as there's a 4 pixel safety margin around the object. In case my yet implementation would choke hard i guess :-) |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
The procedure to read in the vertices is: read in vertices until x is negative (bit 7 set). If you read in 3 vertices till then, draw a triangle else a quad. I just thought i save you a few bytes in the initial dataset :-) |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Do entries need to render from data.bin directly, or can they be preprocessed?
If preprocessing is allowed, can it be at assembly time, or must it be after the demo has loaded?
What timezone is the deadline in? |
| |
Oswald
Registered: Apr 2002 Posts: 5095 |
this compo rather tests line drawing speed than filling. :) I see many obvious optimizations, dont know if I'd give away if write it there, most decent coders know them already. some of these are almost on the animation side... no wonder jackie is testing it in java :)
maybe rules should be extended that frames may not deviate more than X% from some original rendering ? algorithm will make a lossy char anim of this in no time :) |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
Quote: this compo rather tests line drawing speed than filling. :) I see many obvious optimizations, dont know if I'd give away if write it there, most decent coders know them already. some of these are almost on the animation side... no wonder jackie is testing it in java :)
maybe rules should be extended that frames may not deviate more than X% from some original rendering ? algorithm will make a lossy char anim of this in no time :)
data.bin can be preprocessed even before linked to the part, as said, it can be adopted to your needs, it's just important that faces appear at the same place (i bit of inaccuracy is okay, as you can also see from my screenshot), as there's when bluntly using data.bin. Deadline is June 25th 2013 0:00 MET.
any blocky animation will be deteced within no time and a serious drama will be generated upon it, be sure about that! :-) If not it will be downvoted and the drama begins! |
| |
HCL
Registered: Feb 2003 Posts: 728 |
Oh, so it's not about doing the fastest filler, it's about getting peoples votes? That changed the field drastically :P. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Oops, I missed that line; my bad. Thanks for the clarifications, on both counts. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Are we permitted to plot into a 16x16 grid of characters, or must it be in standard bitmap mode? |
| |
enthusi
Registered: May 2004 Posts: 677 |
So the code is supposed to handle exactly those vertices in whatever format? Preprocessing should not include data as i.e. horizontal on/offs? |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
you are free to choose any mode, and yes, it fits into a 16x16 charfield for maybe a good reason :-) Actually i don't want to restrict too much and keep space for playing around :-)
And yes, you may create any new data from the data.bin that suits your needs better, but you'll understand that just dividing all numbers by two might be a bad idea in regards to the outcoming result, but feel free to build any new dataset from it, like for e.g. swap x/y, change clockwiseness or whatever or save more bytes if you feel for it as long as the outcome on the screen is the same as if using the original data. Of course there's no need then to include the original data.bin if you found your own, better format. Also you can preprocess the data with whatever you want, it must not be the task of your .prg to do so, but of course can, if you got BALLS :-) |
| |
JackAsser
Registered: Jun 2002 Posts: 2014 |
To preview the data you may use this piece of java-code which reads data.bin directly:
import java.awt.*;
import java.io.*;
public class Reference extends Frame {
private static final int ZOOM = 4;
private byte data[];
private int dataPointer = 2;
private int next() {
int r = ((int)data[dataPointer++])&0xff;
if (dataPointer >= data.length)
dataPointer=2;
return r;
}
private int peek() {
return ((int)data[dataPointer])&0xff;
}
public Reference() {
super("Bitbreaker's filler compo");
try {
File f = new File("data.bin");
FileInputStream fis = new FileInputStream(f);
data = new byte[(int)f.length()];
fis.read(data);
fis.close();
}
catch (IOException e) {
e.printStackTrace();
System.exit(1);
}
setLayout(new BorderLayout());
final Panel p = new Panel() {
public void paint(Graphics g) {
g.setColor(Color.WHITE);
g.fillRect(0,0,getWidth(),getHeight());
g.setColor(Color.RED);
int x[] = new int[4];
int y[] = new int[4];
while(peek()!=0xff) {
int nbrVertices = 0;
do {
y[nbrVertices] = (next() & 0x7f)*ZOOM;
x[nbrVertices] = (next())*ZOOM;
nbrVertices++;
} while ((peek()&0x80)==0);
g.fillPolygon(x, y, nbrVertices);
}
next();
}
public Dimension getPreferredSize() {
return new Dimension(128*ZOOM, 128*ZOOM);
}
};
add(p, BorderLayout.CENTER);
pack();
setResizable(false);
setVisible(true);
Thread t = new Thread() {
public void run() {
while(true) {
try {
p.repaint();
Thread.sleep(1000/50);
}
catch (InterruptedException e) {
}
}
}
};
t.start();
}
public static void main(String[] args) {
new Reference();
}
}
|
| |
chatGPZ
Registered: Dec 2001 Posts: 11391 |
not very interested in coding hires fillers atm - but i might use this dataset as input for my mod converter and make a sequel of It's All Your Fault with it =P |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
I can also provide oneway encrypted datajunk i created when fiddling around with crunchers like doynax :-) |
| |
Burglar
Registered: Dec 2004 Posts: 1105 |
so this will result in many compo fillers ;)
nice idea, too bad I wouldnt know where to start writing a superfast filler :/ |
| |
Cruzer
Registered: Dec 2001 Posts: 1048 |
Too short deadline for me I'm afraid, but I'll release a version later that beats you all. :) |
| |
Mixer
Registered: Apr 2008 Posts: 455 |
Make the ball actually round, while you're all at it :) |
| |
Skate
Registered: Jul 2003 Posts: 495 |
Do you mind if i use SuperCPU? :D
I'm so busy these days but I'd really like to attend that compo. No promises but i may join the fun. |
| |
Cruzer
Registered: Dec 2001 Posts: 1048 |
Decided to give it a shot anyway, and I already got my first unoptimized version up'n'running. First question - why is it bigger than in your original demo? Wasn't this supposed to be a compo about making a faster version of the exact same ball? |
| |
chatGPZ
Registered: Dec 2001 Posts: 11391 |
the original demo isnt an animation player :) |
| |
HCL
Registered: Feb 2003 Posts: 728 |
Hmm.. also got an unoptimized version running.. 7-8 frames at worst, but then the lines and fill/clear code are looped. Never mind.. the setup feels a bit strange though.. I wonder if those $5000 bytes of animation will rule out optimizations that would be possible if it was real-time.. because just as Cruzer said, i thought we were competing against the original, which was in real-time :P. Very strange this.. :) |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
Well, i have also done an animation version of this ball, with the same filling algorithm as used in the realtime version. The reason to take animated data is to make things compareable and faster. Also i said it already many times: you may change the data to your needs, as long as it produces the same output. I really don't say this for no reason, i changed the data.bin heavily for my purposes. And hey, you can use all mem, so don't complain about those $5000 bytes, it is just what our merry musicians would usually waste, right? :-P |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
Quote: the original demo isnt an animation player :)
I don't think it's so much bigger than in the realtime version, at least when being in the front and at its largest zoom. Also, it still fits into a 16x16 with safety margin, i don't see how this brings trouble :-) |
| |
Trash
Registered: Jan 2002 Posts: 122 |
@ Cruzer & HCL:
In my mind the compo was about the filler and nothing else.
What I'm not really clear about is what benchmark result that should be shown, what unit should be used, fps seems illogical optmized I expect a topcoder to reach 25 fps, average cycles per frame demands som calculations that eats rastertime or am I just stupid? |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
Quoting Trash@ Cruzer & HCL:
In my mind the compo was about the filler and nothing else.
What I'm not really clear about is what benchmark result that should be shown, what unit should be used, fps seems illogical optmized I expect a topcoder to reach 25 fps, average cycles per frame demands som calculations that eats rastertime or am I just stupid?
The benchmark works like this:
The raster irq counts up a counter each time it is called. The filler counts up rendered frames. When it has rendered 256 frames, it reads the counter and spits out the number.
Cheapest way to do so:
lda fcnt_l
sta $07e0
lda fcnt_h
sta $07e1
That'd be enough to make me happy.
This way, you can easily calculate fps or such on your own, but no need to waste too many valuable cycles and bytes on that. |
| |
JackAsser
Registered: Jun 2002 Posts: 2014 |
Quote: @ Cruzer & HCL:
In my mind the compo was about the filler and nothing else.
What I'm not really clear about is what benchmark result that should be shown, what unit should be used, fps seems illogical optmized I expect a topcoder to reach 25 fps, average cycles per frame demands som calculations that eats rastertime or am I just stupid?
Yep, about the filler, problem is that you have $5000 worth of animation data which prohibits unrolled filler code to some degree. |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
Quote: Yep, about the filler, problem is that you have $5000 worth of animation data which prohibits unrolled filler code to some degree.
aren't those limitations there to make our life more exciting? :-) My yet version has still $1000 bytes free, and i have yet no idea on how to waste them. So stop the whining already and optimize! :-P |
| |
JackAsser
Registered: Jun 2002 Posts: 2014 |
Quote: aren't those limitations there to make our life more exciting? :-) My yet version has still $1000 bytes free, and i have yet no idea on how to waste them. So stop the whining already and optimize! :-P
:D |
| |
Oswald
Registered: Apr 2002 Posts: 5095 |
I think one could calculate the 3d to save that $5000 bytes, and still may be faster trough the gain of unrolling :) Dot spheres are VERY cheap to calc. then averege the face Z coords to get face visibility. (if no perspective) half of faces is enough as opposite faces has reverse visibility. etc. |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
If you do not use perspective it
a) looks ugly
b) you possibly have to draw more faces |
| |
HCL
Registered: Feb 2003 Posts: 728 |
..and it would be cheating doing it in realtime. That's the new future, better get used to it ;). |
| |
Oswald
Registered: Apr 2002 Posts: 5095 |
Bitbreaker, I think in case of a spherical symmetric body perspective is not so important, see EoD for proof. Can you recreate the screenshot in post#1 without it so we can see?:) not sure about more faces, do perspective hide a face earlier than without? |
| |
HCL
Registered: Feb 2003 Posts: 728 |
Quoting Oswalddo perspective hide a face earlier than without? Yes, that's about all it does :). |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
Not possible, as it bloats the data.bin by another 33% and makes my code go BOOM then :-) But here's a quickly done comparison (though no idea if wings3d uses similar values for perspective compared to mine):
Also, if you think your approach is faster: proof it! :-) |
| |
JackAsser
Registered: Jun 2002 Posts: 2014 |
No perspective on the ball will be almost unoticable. If u do it realtime noone would be able to tell i can assure u. |
| |
Oswald
Registered: Apr 2002 Posts: 5095 |
one cool thing without perspective I notice on this gif is that you can draw many lines with one bigger line instead.
HCL, perspective also adds distortion which helps the mind see the 3dness more, doesnt it? |
| |
JackAsser
Registered: Jun 2002 Posts: 2014 |
Quote: one cool thing without perspective I notice on this gif is that you can draw many lines with one bigger line instead.
HCL, perspective also adds distortion which helps the mind see the 3dness more, doesnt it?
Just rotate it a bit and symmetry is gone... |
| |
Cruzer
Registered: Dec 2001 Posts: 1048 |
Let's stick to the original precalculations, otherwise the drama will be too intense. But I wouldn't mind getting the full set of coords, including hidden ones. I think it would be easier to pack that way. |
| |
JackAsser
Registered: Jun 2002 Posts: 2014 |
Quote: Let's stick to the original precalculations, otherwise the drama will be too intense. But I wouldn't mind getting the full set of coords, including hidden ones. I think it would be easier to pack that way.
I agree |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
So here's the C-programme that generates the data.bin for you all to play around and in the end miss the deadline for more drama :-P Attention, contains a Makefile!!1! /o\ |
| |
Cruzer
Registered: Dec 2001 Posts: 1048 |
Thanx, Bitbreaker! It compiled and executed on first try. Amazing! |
| |
enthusi
Registered: May 2004 Posts: 677 |
Why not such a vector sphere? ;-)
(bad looking example) |
| |
JackAsser
Registered: Jun 2002 Posts: 2014 |
©enthusi: because that would look undistorted and would be too easy to implement since there are no quads in it. :P But yeah... I agree. Also getting the rotated coordinates for such a sphere in real time is easier that Bitbreaker's I think, not 100% sure though, but it feels more simple. :)
But then again... it's not AAAAMIIIIGAAAAAH! |
| |
Oswald
Registered: Apr 2002 Posts: 5095 |
jackie, where's the symmetry in this triangle sphere ? recently on poutet I saw in a thread, that one method to make an 'iso' sphere is to normalize a cube's vertices. still seems more work than the doth sphere cheats: ie. average many cube points out of 8 then scale them with LUTS. dot sphere is only 3 adds and a lookup, cube is 2 adds and a lookup, hmm :) this is getting sounding like that this method leads to new sphere dot records:) anyway I may be wrong its been ages I did a dot sphere and too tired to think it really trough. |
| |
JackAsser
Registered: Jun 2002 Posts: 2014 |
Quote: jackie, where's the symmetry in this triangle sphere ? recently on poutet I saw in a thread, that one method to make an 'iso' sphere is to normalize a cube's vertices. still seems more work than the doth sphere cheats: ie. average many cube points out of 8 then scale them with LUTS. dot sphere is only 3 adds and a lookup, cube is 2 adds and a lookup, hmm :) this is getting sounding like that this method leads to new sphere dot records:) anyway I may be wrong its been ages I did a dot sphere and too tired to think it really trough.
Dunno where the symmetry is tbh, and I know how to generate one by triangulating each surface and then normalizing the middle point to the radius of the mathematical sphere. However, there are other "spheres" such as http://en.wikipedia.org/wiki/Pentakis_dodecahedron which might be easier to deduce from the rotated world axises. F.e. each vertex may be a simple integer multiple of the x-, y- and z axis summed (like your cube example, but with more complex numbers). Also that would exploit x, y and z symmetry, just like the cube (i.e. +x,+y,+z is just a flipped version of -x,-y,-z). |
| |
HCL
Registered: Feb 2003 Posts: 728 |
Honestly i don't see how this would lead to faster sphere calculations, but it's still interesting if it does :).
Thats sphere pic looks strange.. No symmetry as far as i can see.
Quote:Pentakis_dodecahedron Oh, that seems to be the one in EoD. |
| |
Cruzer
Registered: Dec 2001 Posts: 1048 |
Quoting HCLQuote:Pentakis_dodecahedron Oh, that seems to be the one in EoD. Sure does, almost looks like screenshots from EoD. :)
And now, back to the compo... Must the routine be vblank synched and/or double buffered, or is it ok with artifacts from filling while displaying the gfx? The latter is faster of cuz, so it's important with clear rules about this. |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
I do double buffering what looks white okay on the real machine, anything less will look rather ugly i'd say :-) vblank would be overkill and we'd also loose granularity when comparing the results. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
A few of the polygons near the edge of the sphere are inside out, likely a bug in the backface removal. May we reject these? |
| |
Cruzer
Registered: Dec 2001 Posts: 1048 |
@Bitbreaker: Agree, thanks for clearing it up.
@ChristopherJam: How on earth did you detect that? And I vote against removing them, since the compo is about rendering the given animation fastest, not about making the prettiest vectorball. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
@Cruzer It's more that some plot routines either crash or don't set any pixels at all if the ordering is clockwise instead of counterclockwise.
e.g. if you assume the highest point is the start of a chain running down the right hand side to the lowest point, and the end of a chain running up the left hand side, you can use each chain to make a set of masks
left mask
00111
01111
01111
11111
right mask
11100
11110
11111
11000
left AND right:
00100
01110
01111
11000
If the left and right are the wrong way around, left AND right will produce all zeros, as the right mask will become 0 before the left mask becomes 1
So, even a routine that uses all the data might fail to plot CW polys, so we at least need a ruling as to whether we need to to flip CW polys to make them visible. |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
I'd say optimize them away, as they can't be too many and will not change the result much speedwise and optically (except when crashing the machine :-) ) As for me they didn't cause any trouble, but i also sorted out some faces that will never been shown, like some of them are also at zero height if i remember right :-)
Now i am just pondering what i shall do with my $1800 free bytes, add music? 8-) |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Sweet, thanks. As you say, there were only a few, and they were basically slivers anyway.
Yup, the zero area polygons are already gone :D |
| |
Cruzer
Registered: Dec 2001 Posts: 1048 |
Quoting BitbreakerNow i am just pondering what i shall do with my $1800 free bytes, add music? 8-) Is it oneframed yet? Otherwise free bytes can always be used for optimizations. :) |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
Quote: Quoting BitbreakerNow i am just pondering what i shall do with my $1800 free bytes, add music? 8-) Is it oneframed yet? Otherwise free bytes can always be used for optimizations. :)
It is doublebuffered, fast and fully functional, and it is $1a00 free bytes meanwhile :-) However the biggest free block is somewhat around $800 bytes, the rest is spread all over. So far i can't see any possibility to improve speed by more memory usage. To do so, i'd need a few more pages of zeropage :-) Code unrolling will make things slower, so will do the introduction of lookup tables. I really wasted big amounts of mem for aligned and interleaved stuff though. |
| |
Hein
Registered: Apr 2004 Posts: 954 |
Quote: It is doublebuffered, fast and fully functional, and it is $1a00 free bytes meanwhile :-) However the biggest free block is somewhat around $800 bytes, the rest is spread all over. So far i can't see any possibility to improve speed by more memory usage. To do so, i'd need a few more pages of zeropage :-) Code unrolling will make things slower, so will do the introduction of lookup tables. I really wasted big amounts of mem for aligned and interleaved stuff though.
what cruzer is trying to say is that he's almost having it oneframed. :) |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
aka animated? ;-) |
| |
Cruzer
Registered: Dec 2001 Posts: 1048 |
Quoting Heinwhat cruzer is trying to say is that he's almost having it oneframed. :) Most definitely, and I don't even know what to use the remaining 17000 cycles each frame for. ;) |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
Is it in nufli quality?! |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
If it is, it better only be used for anti-aliasing. The rules clearly state that "background is white, foreground is dark red" ;) |
| |
algorithm
Registered: May 2002 Posts: 705 |
Not looked at it. But how many frames of animation in data.bin? |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
there are 128 frames in there. |
| |
Hein
Registered: Apr 2004 Posts: 954 |
Quote: Quoting Heinwhat cruzer is trying to say is that he's almost having it oneframed. :) Most definitely, and I don't even know what to use the remaining 17000 cycles each frame for. ;)
Fill them up with memory? |
| |
HCL
Registered: Feb 2003 Posts: 728 |
I think we will see a VQ-ed version, one-framed. Now how to set up CSAM to do the job again!? :P |
| |
Perplex
Registered: Feb 2009 Posts: 255 |
The animation won't look very good when played back at 50+ Hz, though. It rotates much too fast for that. I suggest the remaining cycles are to be used for interpolating new frames inbetween the provided keyframes. ;-) |
| |
HCL
Registered: Feb 2003 Posts: 728 |
@Perplex: Agree, very good idea. My idea is no longer to make it as fast as possible, as Cruzer already made it one-framed, with one hell of a margin as well :P. Instead i'm going for the most beautiful speed, do it with NUFLI anti-aliasing perhaps in interlace to get more color depth. I'm going for teh votes, not for the speed :). |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
to achieve better votes, one should place some boobs beneath the ball. |
| |
algorithm
Registered: May 2002 Posts: 705 |
@Perplex. Yes, the additional cycles can use the interleaved method of displaying frame 1 as frame 1, frame 2 as frame 1 and 2 interleaved, frame 3 as frame 3 which in turn would double the time it takes for a whole animation transition (and make it appear smoother)
@hcl. CSAM is not that much suited for outline or line based source (although with some tinkering, charmaps can be extracted and processed further. I do have an unreleased version that has an additional mean-removal before the VQ which works well for preserving edges more efficiently. |
| |
Oswald
Registered: Apr 2002 Posts: 5095 |
not many things looks uglyer than playing animations like that, mayba 8x8 plasmas with +4 pixel x/y shift flicker. |
| |
algorithm
Registered: May 2002 Posts: 705 |
At 50fps it would not look that bad. Interpolating frames via the software approach would not work via pre-generated codebooks via VQ methods hence why i suggested the interleave method (although interpolated frames do not necessarily require to be placed in the codebook at all - just the closest fit to the existing codebook data) but would still require the 256 bytes (16x16) of charlookup data |
| |
chatGPZ
Registered: Dec 2001 Posts: 11391 |
Quote:not many things looks uglyer than playing animations like that, mayba 8x8 plasmas with +4 pixel x/y shift flicker.
using lossy compression and playing it back like that? =P |
| |
HCL
Registered: Feb 2003 Posts: 728 |
Ok, posted a first version.. Just to put some pressure on the rest of you :). |
| |
Cruzer
Registered: Dec 2001 Posts: 1048 |
Thanks! Had actually kinda lost the motivation and started on something else already. :) |
| |
HCL
Registered: Feb 2003 Posts: 728 |
Well, if i had it one-framed already, i would also start on something else ;). ..or stable 2-framed also for that matter.. I didn't do anything magic there at all so if anyone has just something better than 1992-standard, you will probably beat me. But we're supposed to beat Bitbreaker right, so just had to come up with something.
No matter how i think around this, it boils down to something very similar to Cruzer's "hard-line" from YKTR. Feels lame to implement that, since that is probably where Cruzer starts, and optimizes more from there :P.. But since there is an average of ~90 short lines per frame, it is very important to keep a low overhead on the line drawing. I tried a faster line routine with pre-calced div-values but it ended up slower anyway by just fetching those values from the tables (!). |
| |
Oswald
Registered: Apr 2002 Posts: 5095 |
I have similar feelings. I would precalc all possible lines, including the 'trick' of setting many pixels by one sta. each line would be as long as the longest of the used angle, put rts as needed. also precalc shifted versions. then through ZP setup you can access the whole 16x16 matrix. You can save half of the zp setup by using the 7th bit in Y. store the animation as jumptables already into the speedcodes. Then unroll the filler. then realize you're out of mem :)
I did not start on this since I think cruzer and others would do the same. And as counterintuitive as it is, maybe its true that Bitbreakers raster fill method is better for this. |
| |
HCL
Registered: Feb 2003 Posts: 728 |
@Oswald: Damn, seems you're a bit ahead of me in the details of the hard-line, but i figured most of it :). Yeah, it's some hell of a job to set up all that, and i'm afraid if i do it i will make other things slower so i don't gain anything in the end(?).. Perhaps i will try anyway, i think there is just enough memory, but then again perhaps i'm refusing to see some part of it all..
That's sort of why i posted this one. It's just plain and simple, and it does the job not too badly. It is a tiny bit faster than Bitbreaker's original, so if i did the realtime calculations perhaps it would end up on par.. so what did i prove then? |
| |
Oswald
Registered: Apr 2002 Posts: 5095 |
cool :) so if a plain lineroutine is faster, then unrolling will do even more.
did some estimates. just by looking at the screenshot lets say one line is max 24 pixels.
ldy #$00 ;can be skipped sometimes or replaced by iny/dey
lda (),y
ora mask,x ;save the set up of 8 masks when shifting horizontally
sta (),y
makes 9 bytes. 24 pixel radius needs about 74 angles (half of 360 is enough)
24*9*74= 15984 = 16k. pretty good! it would be smaller as near vertical lines doesnt set all 24 pixels, also trough table juggling its still possible to set more than one pixel by one lda ora sta AND shifting horizontally, but only worths it at near horizontal lines.
storing jump addies instead of coords is not possible tho, as it would blow up the anim data (p1->p2, p2->p3, p3->p4, instead of p1->p2->p3->p4) |
| |
HCL
Registered: Feb 2003 Posts: 728 |
..No it's not a looped line, it's a 1992 standard unrolled bresenham:lda #
ora LineBuf,y
sta LineBuf,y
txa
sbc ydiff
bcs *+7
iny
adc xdiff
bcc *-3
tax
The lines are up to 30 pixels long (dx or dy), so you need some more space than you mentioned..
One optimization i can give away is to skip the "ora" in the code above, which i thought would be possible for quite a few lines. Would be easy to store as a single bit in the animation (the last unused bit per vertex-pair), though my measurements showed it's just ~20% of the lines that it applies to -> no go.
Also have not come up with any better way to store the vertexes, that doesn't eat up more precious memory :P. |
| |
Oswald
Registered: Apr 2002 Posts: 5095 |
sorry, I meant unrolling even the slope calcs into ldy #'s :)
why looping on calculating the slope? for such lines use log div, for the rest the non looping bresenham. but you should know that :) |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
I was considering cheating horrendously by storing for the top and bottom of each character-aligned 8 pixel wide slice of each polygon a y-value and an index into a table of edge patterns (short lists of y offset/bitmask pairs), but given that there are over 10,000 such slices over the 120 frames, I don't think I'll have the memory for that approach; even 4 bit y deltas plus 12 bit pattern indices don't leave me enough space for the ~3800 pattern definitions required
Back to the drawing board. |
| |
Oswald
Registered: Apr 2002 Posts: 5095 |
Quote: I was considering cheating horrendously by storing for the top and bottom of each character-aligned 8 pixel wide slice of each polygon a y-value and an index into a table of edge patterns (short lists of y offset/bitmask pairs), but given that there are over 10,000 such slices over the 120 frames, I don't think I'll have the memory for that approach; even 4 bit y deltas plus 12 bit pattern indices don't leave me enough space for the ~3800 pattern definitions required
Back to the drawing board.
damn i dont understand a word :) |
| |
Cruzer
Registered: Dec 2001 Posts: 1048 |
Yup, as you correctly guessed I started out with the hardliner concept. And I'm only at about 2.5 frames so far, but hey, it's just about beating BitBreaker. :)
I haven't done any specialized line routines yet, so the pixels are just being drawn one by one. But I have some plans similar to Oswald's (I think) about drawing multiple pixels in one sta, as well as omitting the eor for the lines that don't share any bytes with other lines. Why are you guys using ora btw?
But optimizations like that are of cuz only worth it if they aren't eaten up by increased administration costs, which are already a huge part of it with a lot of small lines, so I have also worked on getting them reduced. I'm still using BitBreaker's original data, but if I reorganized them so the lines wouldn't have to start from scratch each time in guessing stuff like which of the 4 main directions it's pointing, whether it's flat, etc. it would help a lot. |
| |
Sorex Account closed
Registered: Nov 2003 Posts: 43 |
Quoting Bitbreakerany blocky animation will be deteced within no time and a serious drama will be generated upon it, be sure about that! :-)
I'm not good at this stuff at all so I'm not planning to compete.
And I don't know if speedy fillers use tricks like 8 pixel fills to speed up things so is that counted as a blocky animation aswell then even when it actually "draws" it at each frame?
Or is only 1 pixel fill allowed? |
| |
HCL
Registered: Feb 2003 Posts: 728 |
@Oswald: yeah, that's what i did, i even had a precalced div table.. but just finding the values in the table ate up the benefit of the faster line. Though the bresenham is faster on flat lines, plus that the steap lines are quite short.
@Cruzer: of course it is EOR, not ORA. So, do you have any mem left for further development? |
| |
Cruzer
Registered: Dec 2001 Posts: 1048 |
@HCL: About $2a00 free + various holes in the data/code. And it might be possible to gain some more by packing the coords, but of cuz only if the freed up data can be used for optimizing the whole thing more than the depacking takes. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
@Cruzer, are you already down to 46,500 cycles? (I'm assuming 19656-43*25=18581 cycles available per frame; should really be less once the raster IRQ is factored in)
I think I've only just worked out a way to get that low, and I've less than $1e00 bytes remaining :-/ All my grandiose plans for getting below 2 frames turned out to need at least 70k of ram, unless you count VQ :p |
| |
Kisiel Account closed
Registered: Jul 2003 Posts: 56 |
so maybe is good idea to use memory expansion, like 1541U aka REU ?
VICE have this so it's not a problem. |
| |
chatGPZ
Registered: Dec 2001 Posts: 11391 |
with a reu you can just do an animation, whats the point then? booooring |
| |
Cruzer
Registered: Dec 2001 Posts: 1048 |
We could also just all agree to run Vice at 400% speed. Why waste your time with nerdy optimizations when there's easier ways to get to the same result? :) |
| |
Cresh
Registered: Jan 2004 Posts: 354 |
|
| |
chatGPZ
Registered: Dec 2001 Posts: 11391 |
Quote:We could also just all agree to run Vice at 400% speed. Why waste your time with nerdy optimizations when there's easier ways to get to the same result? :)
i have it running at 75fps using GL \o/ does anybody care? =P |
| |
Axis/Oxyron Account closed
Registered: Apr 2007 Posts: 91 |
@Groepaz: Damn, you have beaten me. My version runs in 50 fps on Amiga. |
| |
Danzig
Registered: Jun 2002 Posts: 441 |
@Axis & @Groepaz: Now please sit down and port Groepazens GL Version to Storm-C using GL on a plain A1200 mit 020er. My bet: 2 fps :D |
| |
Axis/Oxyron Account closed
Registered: Apr 2007 Posts: 91 |
My bet: Wont start at all.
?Out of chipmem error
Ready. |
| |
chatGPZ
Registered: Dec 2001 Posts: 11391 |
i have ported SDL with software GL before.... maybe we can use that? 2fps is mandatory for amiga afterall =) |
| |
Danzig
Registered: Jun 2002 Posts: 441 |
Quoting Axis/OxyronMy bet: Wont start at all.
?Out of chipmem error
Ready.
Yep, your bet is closer to reality I guess... Still I remember laughing my ass off, when I first saw the GL-samples with Storm-C back then. And that was 68030/50 already!
Quoting Groepazi have ported SDL with software GL before.... maybe we can use that? 2fps is mandatory for amiga afterall =)
Yeah, I already knew that the topic is a typo by bitbreaker. He was supposed to call it: "SLOW me your (vector)balls". Damn you, Bierkeule! |
| |
HCL
Registered: Feb 2003 Posts: 728 |
:D
Well, i wasted a whole day on implementing the hard-liner.. and all i got was ~16 frames or so on 256 animation steps :(. Overhead ate my hardliner.. :P. Ok, now at least i have done it, perhaps i can reduce some shit somewhere.. But no matter what, i'm still generations behind Cruzer.. |
| |
Mixer
Registered: Apr 2008 Posts: 455 |
My version is round faced and runs on PC! I might join in, but I'll cheat! naah, no time.
I've been thinking whether this could be done by rearranging and organizing the animation coordinates by drawing each red face on sprites with somewhat hardcoded sprite filler and then just multiplexing sorted sprites on screen. Perhaps one could even shrink all the continuous vertical edges and expand them with sprite stretcher, however stretching eat precious cycles. Similar approach might work with chars, but I guess the overhead makes it pointless. |
| |
Oswald
Registered: Apr 2002 Posts: 5095 |
Quote: :D
Well, i wasted a whole day on implementing the hard-liner.. and all i got was ~16 frames or so on 256 animation steps :(. Overhead ate my hardliner.. :P. Ok, now at least i have done it, perhaps i can reduce some shit somewhere.. But no matter what, i'm still generations behind Cruzer..
dont get it, drawing 256 phases in 16 frames is like 20 rasterlines to draw one phase? :) |
| |
HCL
Registered: Feb 2003 Posts: 728 |
@Oswald: No, i *gained* some 16 frames. Calm down boy ;).
@Mixer: I think the sprite-shit will be hard to get working.. The faces are just a tad bit bigger than one sprite. That also rules out the popular sprite-filler (by updating sprite x-pos each rasterline), since it would require masking the right edge with a white sprite, and there are possibly more than 4 red ares on one rasterline.. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Well, I finally got something working last night, albeit running in around 2.8 frames. Got it down to 2.6 today and sent off a draft to @Bitbreaker.
Now to try to improve it further :D |
| |
Oswald
Registered: Apr 2002 Posts: 5095 |
2.5 is suprisingly good, considering that about a 3/4 frame alone is the filling :) |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
So guys, i see you have been struggling hard and there's already two drafts in my inbox, while i was laying at the beach and enjoying my holidays :-) |
| |
Skate
Registered: Jul 2003 Posts: 495 |
I was hoping to find some time to join this compo. but there is just 4 days to go and i couldn't even start yet. :/ |
| |
Oswald
Registered: Apr 2002 Posts: 5095 |
Quote: So guys, i see you have been struggling hard and there's already two drafts in my inbox, while i was laying at the beach and enjoying my holidays :-)
you had a release version before the compo even started, so what? |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
Quote: you had a release version before the compo even started, so what?
right, but a slow one though that i improved since the compo started. I'd say everyone has some filler-routines ready to adopt to that compo. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Ideas yes, code no. I started from scratch when the compo was announced with no more code than a raster IRQ initialiser and a few lines of Python that dump arrays out to .a65 sources or .prgs |
| |
Oswald
Registered: Apr 2002 Posts: 5095 |
Quote: right, but a slow one though that i improved since the compo started. I'd say everyone has some filler-routines ready to adopt to that compo.
I'm eager to see who which method will win, eorfill, or per scanline filling :) |
| |
chatGPZ
Registered: Dec 2001 Posts: 11391 |
and i am still tempted to attempt a plain animation.... weather doesnt contribute a lot to it actually happening though =) |
| |
The Syndrom
Registered: Aug 2005 Posts: 60 |
this would've been my approach aswell - I'm just too lazy to generate all single frames as a first step. |
| |
Mr. SID
Registered: Jan 2003 Posts: 424 |
I have a python script that renders all the frames to png files. If anyone is interested... |
| |
chatGPZ
Registered: Dec 2001 Posts: 11391 |
please, its the main show stopper for me atm aswell, cant be arsed to render the frames =) |
| |
Stone
Registered: Oct 2006 Posts: 172 |
Perplex rendered these: https://www.dropbox.com/sh/wkdj7d7cjo5wob9/YQqvLkyYeN/amigaball.. |
| |
Mr. SID
Registered: Jan 2003 Posts: 424 |
It's completely based on Bitbreaker's C source:
http://galway.c64.org/~sid/ball.py |
| |
BYB
Registered: Jan 2011 Posts: 20 |
thanks for uploading the png's.
Maybe it's possible to save some mem by mirroring verticaly the half of frames. So Hires consists of only back- and foreground colors, the ball should be shaped ofcoz. Sprites allowed? |
| |
Perplex
Registered: Feb 2009 Posts: 255 |
Quote: Perplex rendered these: https://www.dropbox.com/sh/wkdj7d7cjo5wob9/YQqvLkyYeN/amigaball..
Here's the Ruby code I used if anyone's curious: https://gist.github.com/lhz/5748054 |
| |
Cruzer
Registered: Dec 2001 Posts: 1048 |
Deadline got here faster than expected, so I didn't get to implement any further optimizations. But looking forward to the other entries! |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
Can i receive all the animated versions now and the versions of those who shouted the loudest that this is an easy job? :-) It is about to deliver guys! |
| |
HCL
Registered: Feb 2003 Posts: 728 |
WTF!? June.25 00:00, that's a weirdo time for a deadline!! You has it just any date at 23.59, that's how you do it.. Now i just remembered June.25, and that's today, and *boff* the deadline already passed :-O. I post my last balls now, and the drahma about the deadline starts right here!! ;). |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
yay, drama \o/ of course i'll accept your actual entry past the deadline for the sake of that! :-) Sorry for confusing :-) |
| |
Cruzer
Registered: Dec 2001 Posts: 1048 |
Quoting BitbreakerIf the deadline is extended, a severe drama is expected WTF, if I knew it would be extended I could have done so much more, bla bla bla... No, it's ok :) |
| |
Martin Piper
Registered: Nov 2007 Posts: 726 |
I wonder if anyone used a cart? |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Wait, what? Deadline's gone?
I got caught by the same midnight confusion :(
Can I send the code as it was earlier today? |
| |
Martin Piper
Registered: Nov 2007 Posts: 726 |
Using lossless delta animation each frame was down to about 300 bytes. With the number of frames that meant too much memory. :( |
| |
The Human Code Machine
Registered: Sep 2005 Posts: 112 |
There are only 128 frames which will be repeated twice and this should fit into the memory! |
| |
chatGPZ
Registered: Dec 2001 Posts: 11391 |
Quote:Using lossless delta animation each frame was down to about 300 bytes. With the number of frames that meant too much memory. :(
damn, no need to try myself then :) funny enough, i somehow got distracted and caught myself tweaking some old line drawer of mine =P |
| |
HCL
Registered: Feb 2003 Posts: 728 |
Haha.. quite funny this. Someone sets up a goal, a bunch of guys aims their weapons, and in the end, hit targets of a great variation more or less far from the goal :D. |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
it is said that there is an entry from Metalvotze that is worth waiting for. So i'll upload all entries + results as soon as it's done. |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
Quote: Haha.. quite funny this. Someone sets up a goal, a bunch of guys aims their weapons, and in the end, hit targets of a great variation more or less far from the goal :D.
They are getting old and loose eyesight! :-) |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
Finally, there's the results, and thanks to Metalvotze, none of the serious competitors will be last :-)
So here come the results (executables):
place handle frames
god bitbreaker $24f
----------------------------
1. christopherjam $25f
2. axis $29c
3. cruzer $29e
4. hcl $2a4
5. drago $dead
As you see, i decided to be out of compo :-) Congrats to ChristopherJam pushing the eor-filler that hard to nearly reach my result and a big thanks to all that participated! Now it is time to discuss and boast in detail i guess? :-) Now show me your inballievable code! |
| |
Axis/Oxyron Account closed
Registered: Apr 2007 Posts: 91 |
Congrats to Christopher. Great work dude!
After taking a look into the code of all entries I have to say: "We are all bloody uncreative".
The code looks something like 90% identical (eor filler mostly unrolled and slopetable lines) with only small changes in the details like how the stored data is converted to slope spans.
I really hoped that someone comes up with a nice innovation. Like special code for flat lines gathering multiple pixels per store or some tricks to avoid the eor per linepixel.
I guess this has to wait until the next compo. |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
So that is how it is done in my case:
All coordinates are shifted 4 pixels to the right, so that only 15 columns have to be treated (thanks to THCM for that hint) also all faces are resorted so that they are drawn from right to left, thus the left edge of all faces never will come into contact with edges from other faces, and ora'ing with already present content in the buffer can be omitted on left side.
The use of slope tables seem to be common practice, so not much to tell here. Funny enough one comes a long with rather small tables here, but the input data is bloated up by that process. To save bytes the format of data.bin was adopted quite a bit.
Stuff has been aligned to convenient places so that most of all the jump pointers can be easily calculated with 2 cycle illegal opcodes. Also a lot of code is squeezed into the zeropage and from $ff9c on, so that both code segments can be accessed via normal branches (address wraparound).
Here's the source have fun digging through it. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Thanks, guys!
And yes, I was hoping to get away from using an eorfiller too, but my span fill attempts either took up way too much ram with tables, or were too slow, despite presorting the polys from left to right so I never needed to mask the right hand sides.
All of my slopes are taken from a single 384 byte table, but the offsets into it are all precomputed and my memory usage is pretty dire. I'm damn impressed that bitbreaker managed to get the best time in only 133 blocks!
Each edge is stored as two bytes for the address to jump into an eorfill routine, the low byte of a base pointer into the eorbuffer, and an offset into the slope table.
The eorfill routines plot up to 32 pixels, each taking its Y value from basepointer+slope[x*4+offset]. They plot two pixels at a time if the slope is low enough.
I too shifted four pixels left and four pixels up, so I only use 15x15 chars.
If I knew I could have gotten away with a single charset for display, that would have saved me a few kb to unroll some of the loops further.
Very impressed with the cleanness of some of the other entries. |
| |
Axis/Oxyron Account closed
Registered: Apr 2007 Posts: 91 |
My implementation is a pretty straight forward eor filler with slopetable lines.
Completely unrolled speedcode for clear and fill that only accesses the bytes that get touched in at least one frame of the animation.
I shifted the coordinates -3 in x. My prototype reported the least amount of touched bytes in that position.
My address generation in the lineroutine is optimized down to 1 inc-zp every 16 pixels, because all the #$80 fiddling is encoded into my slopetables.
The rest is just classic code-optimization of the lineinit.
I just realized, I could have saved a lot with storing the line speedcode with multiple widths. So I dont need to patch and restore an rts into the linecode. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Sadly I only thought of including the $80 fiddling into the slope tables this morning, but I did have multiple speedcode fragments rather than doing RTS patching. All my fragments just JMP back into the main loop in zero page. |
| |
Shadow Account closed
Registered: Apr 2002 Posts: 355 |
Aw damnit, I missed the deadline with my joke entry.
Oh well, here it is anyway:
http://ag1976.com/tmp/amiga_petscii.prg
50 fps baby, oh yeah! :D |
| |
chatGPZ
Registered: Dec 2001 Posts: 11391 |
so, as if anyone cares, ofcourse i tried anyway :)
in: 263168 bytes (128 file(s)) out: 69281 bytes (1 file(s)-128 frame(s)) left: 26.33%
so that'd *almost* fit, using plain delta+RLE on the bitmaps. using screen+charset properly might actually make it fit, but since my little packer doesnt do that automatically i couldnt be arsed to test =P
oh, and it runs so fast that it looks totally crap, making the whole animation attempt somewhat pointless =) was interesting to get some figures for code- and data size of both attempts though, cheers =) |
| |
Shadow Account closed
Registered: Apr 2002 Posts: 355 |
Groepaz: Yeah, I noticed that on my PETSCII animated version as well - the animation is really not well suited for 50fps display, it just turns into a pink blur... Running VICE at 50% speed actually makes it look better! |
| |
BYB
Registered: Jan 2011 Posts: 20 |
Where to download? I would like to see all the other versions too. Actually i only saw the petscii one, really nice work and idea. :) Ah, i found the competition entries up there :) |
| |
Shadow Account closed
Registered: Apr 2002 Posts: 355 |
Bitbreaker posted a link to the executables in the same post with the results (#134) |
| |
HCL
Registered: Feb 2003 Posts: 728 |
Big congratulations to ChristopherJam who is the true winner of this compo, and also the only entry with correct double buffering!! No wonder you had memory problems if you managed to do that.
Omg, i ended up on last position :(. This definitely ends my era as a 1337 coder.. but i still claim that i once was.
..So here comes a few excuses. I wasted ~2 weeks on a huffman-packed animation of the line-buffer, which turned out alot slower than i expected. Well, at least i *tried* something else than an eor-filler, but later went back to implement something like Cruzer's hard-liner from 2004. I didn't really get any further than Cruzer did back then i suppose, or rather i didn't even get there probably :P. From the benchmarks it looks like Me+Cruzer+Axis implemented almost exactly the same thing.
I really would have wanted to go further from here, but there was no more time, and also my energy was starting to drain. I still have most of the zeropage unused, and the data.bin is untouched, but shifted 4 pixels like the rest of you also did. The last optimization that gained me some $18 frames or so was to unroll the vertex read-loop for one face, and thus also duplicating the line-init (via macro) to operate on various vertex combos. Gained more than i expected.
Ok, time to check out the other entries to see if there is something interesting.. Should be for a lamer like me :P. |
| |
Cruzer
Registered: Dec 2001 Posts: 1048 |
Gongrats to Birbreaker and ChristopherJam, truly impressed. Guess it's back to the drawing board. Great to see that the filled vector standard has reached a new level compared to the 1992 style that rules for many years. |
| |
HCL
Registered: Feb 2003 Posts: 728 |
Cruzer's implementation looks really clean and almost unoptimized. I bet you could gain some easy frames here and there..
Axis's ball is crapped on places.. I call for a ~$10 frames penalty ;). Well, it's probably just an easy bug fix, but it looks sloppy.. 3 bytes per vertex was kinda innovative, don't you think?
CJam, WTF? Looped clear + eor-fill, and still you beat us with margin. Ok, some zp-code there, and lots of precalced data, but i'm still like WTF?! Gotta learn the lesson :P.
Bitbreaker's ball was fast, ok, but you have also had two compos to optimize it ;). Besides i still think that span-filling is slower if you realtime-calc the vectors, hmm. that requires a proof i suppose :P ..and you *are* having double buffers, so it would be a piece of cake to do it bug-free then!?
I don't know if this applies to some of your lines, but i draw the lines backwards.. Then i don't need to find an address to put the RTS and then restore it. The line always finishes on the right place with RTS, and i just have to find out where to start. WTF, i did the worst result, i shouldn't come with tips and trix.. i should try to learn instead, it's just hard to change roles :P. |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
Quoting HCL
Bitbreaker's ball was fast, ok, but you have also had two compos to optimize it ;). Besides i still think that span-filling is slower if you realtime-calc the vectors, hmm. that requires a proof i suppose :P ..and you *are* having double buffers, so it would be a piece of cake to do it bug-free then!?
What you mean with bug free? without tearing? In fact the tearing is not too heavy on the real machine. I might vsync but loose a few frames by that. Redoing the clearing to make it happen linewise and not columnwise would help, to make the synching happening within a tighter range.
As for realtime calced vectors i'd need ~$33e for just the filling and calculations coming along with that, if i remember correctly.
Also i had to do many things on that filler from scratch or can we take that as a hook for some serious drahma please? :-P |
| |
PopMilo
Registered: Mar 2004 Posts: 146 |
Thanks!
Thank you Bitbreaker for making this compo, thank you all who showed how to code this thingy...
My take on this - we need more of these small, focused competitions! |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
An eorfiller with double buffers can be tear free without losing any frames by only changing $d016 in the IRQ, but any plotter that spends most of its time writing the the final bitmap requires triple buffers for the same effect - and if you've unrolled code dedicated to each buffer, that eats memory fast.
@HCL, the looped clear+fill costs me an extra few thousand cycles per frame, but it frees up memory for more unrolled speedcode fragments, which saves me more time than I lose to the loops.
As it is I'd hoped to spend an evening tuning how much memory I allocate to each of the clear, fill, and edge routines to optimise cycles, but like you I misread the deadline!
The speedcode uses a mix of inlined and JSRs for incrementing the column pointer, with inlines on the more commonly used cases towards the ends of the routines, but JSRs to save RAM on the less used cases. Again, the distribution's probably not all that optimal.
I draw lines left to right or right to left depending whether the slope is up or down; this way I only have positive slopes in my slope table. |
| |
chatGPZ
Registered: Dec 2001 Posts: 11391 |
darn. i find myself working on the damned packer now, because i think a both faster AND shorter version (than the ones shown) is within reach. damn urge to code useless crap it is =) |
| |
The Syndrom
Registered: Aug 2005 Posts: 60 |
apart from the animation-approach I even think Shadows very nice version can be improved, if you split the charsets into 4 or 8 to gain full resolution. You'd still need some kind of preprocessor to sort/match the charsets to the 128 frames. |
| |
Shadow Account closed
Registered: Apr 2002 Posts: 355 |
Actually, by just using a custom-charset instead of the standard C64 one, I think I could get something that looks considerably better (especially given the very fast rotation speed at 50 fps).
Hmm... maybe I will give it a try just for fun! |
| |
algorithm
Registered: May 2002 Posts: 705 |
Having it look good at 50fps would require either more of a smoother transition between each frame (256 frames+) or some type of interpolation.
Notice frame by frame how some frames are near identical or/and with char flipping can be reused. |
| |
Shadow Account closed
Registered: Apr 2002 Posts: 355 |
Quote:Actually, by just using a custom-charset instead of the standard C64 one, I think I could get something that looks considerably better
I hereby retract the statement above. It looked like crap! I would proably need more intelligence in the charset analyzer, the "as-few-bits-diff-as-possible"-method didn't cut it at all. |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
dear animators,
here's my petscii attempt
and ugly attempt (charset with 256 individual chars for all frames) |
| |
PopMilo
Registered: Mar 2004 Posts: 146 |
@Bitbreaker: Good enough to totally distract me on workplace!
Looks decent on larger surfaces... Not bad at all. |
| |
Cruzer
Registered: Dec 2001 Posts: 1048 |
How about using half sized chars (8x4 px)? The animation should still fit in memory with 128 frames and 15x30 cells without the corners. |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
Quote: How about using half sized chars (8x4 px)? The animation should still fit in memory with 128 frames and 15x30 cells without the corners.
Would be possible, but it might still look crap, and actually i don't bother much pushing those frames through my various converters :-) With one charset each 16 frames it still looks kind of crap :-) |
| |
chatGPZ
Registered: Dec 2001 Posts: 11391 |
i doubt anyone would have noticed the crap even in the one charset version =) it moves so fast, you cant really make it worse by making the charset a bit inaccurate =D (still, i would aim for non-lossy compression.... lossy is crap by definition =P) |
| |
Oswald
Registered: Apr 2002 Posts: 5095 |
interesting, now either everyone was wrong with the eor fillers so far, or bitbreaker found a corner case for span fillers. I didnt thought I will see this moving this fast though and that includes hcl's version :) |
| |
Axis/Oxyron Account closed
Registered: Apr 2007 Posts: 91 |
It is ofcourse an extreme combination of corner cases.
Hires leads to the fact that the eor fillers cant use their profit of drawing the lines only in half the resolution.
An object without any shared edges helps the span filler to avoid drawing the edges twice and avoids a lot of masking and oring at the edges.
Also the structure of the object is pretty optimal for span fillers.
So, the same fillers with different data would lead to total different results.
With a low poly object in lores I am sure Cruzer will run circles around all of us. |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
Quoting Axis/OxyronIt is ofcourse an extreme combination of corner cases.
Hires leads to the fact that the eor fillers cant use their profit of drawing the lines only in half the resolution.
An object without any shared edges helps the span filler to avoid drawing the edges twice and avoids a lot of masking and oring at the edges.
Also the structure of the object is pretty optimal for span fillers.
So, the same fillers with different data would lead to total different results.
With a low poly object in lores I am sure Cruzer will run circles around all of us.
No one wants lowres anymore nowadays :-) Also i am not sure if the span filler performs worse on low poly stuff. Of course edges are shared then (what is expensive in the yet implementation, as it implies some overhead per line and face), but bigger areas that i can fill with 5 cycles per 8 pixel get filled pretty fast therefore. |
| |
Oswald
Registered: Apr 2002 Posts: 5095 |
nobody? :) I love early / mid 90's style pc/amiga LUT effects, which is usually not possible in hires.
It would be interesting to see how your span filler performs on a cube compared to eor fillers. as you say it's indeed superior when looking at larger areas (5cycles vs 8 or more). one could ofcourse buffer the edges to avoid calculating them twice, but that has some overhead aswell, so one has probably go through all the pain and code it entirelly to see wether it worths it. span fillers loose out on the edges to eor fillers (the preparations to jump into correct speedcode span & masking the edges), its hard to guess which would win. |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
A cube at around same size is done in $14a frames, including different patterns per face, and without using slopetables but calculating the slopes on the fly. If anyone has the need to try i can provide the data used for that testcase. |
| |
chatGPZ
Registered: Dec 2001 Posts: 11391 |
Quote:nobody? :) I love early / mid 90's style pc/amiga LUT effects, which is usually not possible in hires.
just do it i say :) |
| |
Oswald
Registered: Apr 2002 Posts: 5095 |
Quote: A cube at around same size is done in $14a frames, including different patterns per face, and without using slopetables but calculating the slopes on the fly. If anyone has the need to try i can provide the data used for that testcase.
$14a for 128 rotation phases? thats ~2,5 frame per phase, a nicely optimized eorfill version should be 2 or a bit more. |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
*sigh*
$14a for $100 rendered frames, just as with the vectorball where it is also $100 rendered frames. |
| |
Oswald
Registered: Apr 2002 Posts: 5095 |
well, then its pretty much killing the eor filler method, and you even have dither for free :) |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Hey, eor filler has dither for free if you fill from left to right instead of top to bottom (eg final part of Effluvium). |
| |
Oswald
Registered: Apr 2002 Posts: 5095 |
how do you fill from left to right?
I remember my very first naive filler was trying doing something like that with tables, so if a byte b4 filling looked like:
00110000
then feed it into a table and you get:
00111111
fill left to right ;) |
| |
JackAsser
Registered: Jun 2002 Posts: 2014 |
Quote: well, then its pretty much killing the eor filler method, and you even have dither for free :)
EOR-fill dithering is free in the fill-phase but require "thick" vertical lines when drawing the lines. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
@Oswald, it requires two lda:eor:sta sequences for each row, once to set the right-hand pixels for the bytes, one for the left hand. I guess that's what @JackAsser means by 'thick lines'?
Eg, for an edge that starts 43 MCM pixels into a row, flipping from solid 0x00 to solid 0xff
lda#$03:eor row+10:sta row+10
lda#$fc:eor row+11:sta row+11
(assuming contiguous bytes for each row of the eorbuffer; I can't actually remember whether effluvium had a fullscreen eorbuffer or if I just did one row at a time) |
| |
Oswald
Registered: Apr 2002 Posts: 5095 |
I dont get it but two runs is more expensive, than span filling or the eor filling method. What jackasser means.. I guess its easyer to see how graham did it in natural wonders, it takes a lot to explain :) |
| |
HCL
Registered: Feb 2003 Posts: 728 |
$14a sounds too fast for an eor-filler where ~$a0 is spent on the eor-shit.. (if i got the calculations right :P). But i think that "glenz" might be the thing that could keep the eor-filler alive. Drawing many lines *should* be faster with the eor-filler since nothing else than the line itself needs to be in fast registers, then multicolor also gives a benefit. So perhaps it turns out that both Bitbreaker and Cruzer found their corner cases where their mad implementations are just superior to anything else ;). |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
@HCL: no doubt, at glenz my routines would suck hard, that is where eor has its true benefit :-) |
| |
PopMilo
Registered: Mar 2004 Posts: 146 |
Quote: Would be possible, but it might still look crap, and actually i don't bother much pushing those frames through my various converters :-) With one charset each 16 frames it still looks kind of crap :-)
@Bitbreaker: Any chance on 'lending' any of those converters ? :)
Would like to test char-based animation on another type of graphic... |