| |
pmprog Account closed
Registered: Nov 2005 Posts: 54 |
[Req for help] TigerMoth performance
Hi all,
I'm getting to a point in TigerMoth that I'm thinking of dropping it. If you watch the video below, you can see when I start hitting a large number of bullets I get really bad performance.
https://www.youtube.com/watch?v=Lskbol7quDk
It's kind of expected, but this version doesn't even have the raster splits for dealing with the score or player, no collision detection for the TigerMoth or the player, nor any music, so performance is only going to get a lot, lot worse. I really like what I've managed to put together so far, but if it's not going to be playable, then I might as well drop it now.
So I thought I'd pop on here and see if anyone was willing to have a quick look and see if there were any points that look really bad. My code is broken down in to lots of subroutines (this might be a problem regarding performance?) with headers that hopefully explain what it's job is, and I've tried to keep all the related pieces together in aptly named files so it's easy to mooch about.
I've been working in a private git on my local server, but I've uploaded a snapshot of the current code to GitHub. The source code is MIT license, and if the project is finished, the full source will be released with the game.
https://github.com/pmprog/TigerMothC64
Also, for reference, the .spriteproject file can be opened in C64 Studio. Export to data, but change the "!byte" to ".byte"
C64 Studio 4.1
I currently use TASM as my compiler, though I was thinking of trying to move across to the ca65 assembler in the cc65 package. But that doesn't really have any baring on the life of this project.
Oh, and finally, before anyone says, yes, I've got my Sine/Cosine tables all messed up (Sine starts at the positive value, Cosine starts at zero and goes negative before positive), but I'm "okay" with that, I just count my angles anti-clockwise from the "3 o'clock" when dealing with bullets
Thanks in advance |
|
| |
oziphantom
Registered: Oct 2014 Posts: 490 |
Kind of hard or follow the code from those files ;) where is the main loop?
So its a bullet hell? Do you try and update every bullet each frame? if so making it updates the bullets over 4~5 frames would help. |
| |
Mixer
Registered: Apr 2008 Posts: 452 |
A faster plot routine seems to be possible with tables. See
http://csdb.dk/release/download.php?id=182655 for instance.
It may be possible to improve trajectory calculation as well. Look at fast line drawing routines.(Bresenham)
You should be able to have 256 plots in 2 to 3 frames easily. |
| |
pmprog Account closed
Registered: Nov 2005 Posts: 54 |
Quote: Kind of hard or follow the code from those files ;) where is the main loop?
So its a bullet hell? Do you try and update every bullet each frame? if so making it updates the bullets over 4~5 frames would help.
Ah, sorry. The main game loop enters here:
https://github.com/pmprog/TigerMothC64/blob/master/screen_game...
Quick ref:
screen_*.inc files are "segments", so eventually there'll be a "screen_gameover.inc" etc.
game_*.inc files are routines and data for the actual game
tools_*.inc files are where I keep general routines not specific to this game
This checks the health of the TM, then goes through the raster checks, then processes the "frame", followed by checking the player is alive.
Each game loop, it processes 8 bullets (this can be tweaked), and only once all 256 bullets have been processed does it do any player/TM movement.
I did it like this so that everything moved together, as on my first write of the game, when bullets moved in seperate frames, it looked a bit odd - especially when you shot a semi-circle all on the same frame, then some bullets moved, and some didn't. But it's something I can try again and see.
Cheers |
| |
pmprog Account closed
Registered: Nov 2005 Posts: 54 |
Quote: A faster plot routine seems to be possible with tables. See
http://csdb.dk/release/download.php?id=182655 for instance.
It may be possible to improve trajectory calculation as well. Look at fast line drawing routines.(Bresenham)
You should be able to have 256 plots in 2 to 3 frames easily.
I did look at this document, and it looked like a lot of repeating data in the tables.
http://codebase64.org/doku.php?id=base:dots_and_plots
I do use tables, but combined with some calculation so I don't have to have such large tables with repeating data.
I will look at using a larger table to reduce some more of the calculations though
Cheers |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
It's pretty ambitious; the Amstrad version looks like it's only doing around 80 enemy bullets (5 waves of 16), and they've got a 4MHz Z80 to play with. (I know, z80 takes more cycles per instruction than 6502, but it's not 4x as many, and their 16 bit registers would probably be handy for plots, too)
256+ plots per frame is doable under demo conditions, but that's without also handling collisions, sprites, other game logic etc.
But yes, those JSRs to sine and cosine in your update routines need to go; inline all that.
Maybe write out a routine that does everything that a bullet needs to do each frame (update position, plot, collision check), inline all the function calls, optimise what you can, then do a cycle count.
Work out how many cycles you want to budget on bullets per frame, divide that by your update time, and that'll tell you how many bullets you can animate. If it's within cooee of the minimum you'd consider tolerable, then continue, otherwise have a rethink. |
| |
pmprog Account closed
Registered: Nov 2005 Posts: 54 |
Quoting ChristopherJamIt's pretty ambitious;
Somebody mentioned that on my YT video, I'm beginning to agree :)
Though I've changed it so it processes 128 bullets per frame, and seems to make quite a nice improvement
Quoting ChristopherJamMaybe write out a routine that does everything that a bullet needs to do each frame (update position, plot, collision check), inline all the function calls, optimise what you can, then do a cycle count.
That might be an idea. If I can calculate my cycle counts, I guess I could potentially drop my raster checks and know exactly how many bullets to process between each other bit of code.
Cheers |
| |
oziphantom
Registered: Oct 2014 Posts: 490 |
Quoting pmprogI did it like this so that everything moved together, as on my first write of the game, when bullets moved in seperate frames, it looked a bit odd - especially when you shot a semi-circle all on the same frame, then some bullets moved, and some didn't. But it's something I can try again and see.
For something like this, I would make "buckets" and the you update the bucket at once, and then stager the bucket updates. This way a semi circle could be put into 1 bucket so they all get updated.
Also the Amstrad probably ( never dev'd for it ) has a linear bitmap and not the weird as Char system we have on the C64, which helps as well.
With bullets, I imagine they can't move very fast each frame , ie that they won't be able to move say more than 1 char. This means rather than doing a full set of maths to get the point, you can store them as Pointer to Char, x,y then when the x overflows you add 8 to the Pointer, under sub 8, under Y - 320, over y + 320 and keep all of its movements in a delta system rather than convert from X,Y to bitmap.
If you make this for say EF then you have ROM to burn ;) so you can make a bunch of code to handle however many directions you want say you have 16 angles or 32 angles you can support.
so do a diagonal down right so have
Step 0
lda (pointer),y
and #%01111111
sta (pointer),y
iny
lda (pointer),y
ora #%01000000
sta (pointer),y
rts
Step 1
lda (pointer),y
and #%10111111
sta (pointer),y
iny
lda (pointer),y
ora #%00100000
sta (pointer),y
rts
....
Step 7
lda (pointer),y
and #%11111110
sta (pointer),y
jsrAdd321 ( shift down Y and move over 1 char)
lda (pointer),y
ora #%10000000
sta (pointer),y
; set step to 0
....
so this way you jump into a step into an angle and then each update move its pointer, where each step is a custom routine to update the x or y in a relative manner and shift the pixel as needed. |
| |
pmprog Account closed
Registered: Nov 2005 Posts: 54 |
Quoting oziphantomFor something like this, I would make "buckets" and the you update the bucket at once, and then stager the bucket updates. This way a semi circle could be put into 1 bucket so they all get updated.
That's a good suggestion, I'll look into that
Quote:With bullets, I imagine they can't move very fast each frame
I only move them a maximum of 1 pixel horizontally and 1 pixel vertically. The player's moves 4 pixels vertically, but that's handled differently.
A big thank you to everyone who's posted (and emailed, I hope you get my reply(s), as the mail client was playing up on my end), I've got a few new things to try, and I'll see how I get on; and I'll probably post an update when I've made some progress. |
| |
JackAsser
Registered: Jun 2002 Posts: 2014 |
Also lower the resolution of the plots. Check my plot routine in our demo Classics. It kinda looks like bullet hell to me and might be usable for you. |
| |
ptoing
Registered: Sep 2005 Posts: 271 |
Gotta agree with Jackasser. A bit of slowdown now and then is OK in a bullet hell shmup, but that video you linked looks unplayable.
I think a screen resolution of 160x100 for the plots would be sufficient. |
| |
soci
Registered: Sep 2003 Posts: 480 |
I took a trace at the time when it's slow. Got this:
http://singularcrew.hu/temp/96f7bbe1a270a5667b6f91878f504a18.png
Inlining maths_sine and maths_cosine is not worth it at this point. Go for bitmap_plotpixel and bullets_check first.
Btw. nice structured code. |
| |
pmprog Account closed
Registered: Nov 2005 Posts: 54 |
Quoting ptoingI think a screen resolution of 160x100 for the plots would be sufficient.
I would like to make the bullets bigger (and thus easier to see), but the way I was looking at it meant that you needed to plot 6 extra pixels per bullet (instead of one "clear" and one "draw", you'd be doing four "clear" and four "draw")
That said, I guess if I switched into multicolour mode, then I could reduce that to drawing two pixels per bullet to get a nice square block; and it would let me draw the player bullets in a different colour to the TigerMoths
Quoting soci
I took a trace at the time when it's slow. Got this:
http://singularcrew.hu/temp/96f7bbe1a270a5667b6f91878f504a18.png
Wow, amazing. What tool did you use to produce that?
Inlining maths_sine and maths_cosine is not worth it at this point. Go for bitmap_plotpixel and bullets_check first.
Quoting soci
Btw. nice structured code.
Thanks for the compliment |
| |
ptoing
Registered: Sep 2005 Posts: 271 |
Quote:That said, I guess if I switched into multicolour mode, then I could reduce that to drawing two pixels per bullet to get a nice square block; and it would let me draw the player bullets in a different colour to the TigerMoths
Yeah, that is what I would suggest. Make the bullets Mcol, and update gfx only every other line, so every 2nd line can use the same data as the odd line before it. The different colour thing is also a nice bonus. |
| |
soci
Registered: Sep 2003 Posts: 480 |
First half was deleted as I was too slow ;)
What I wanted to add beyond that is that bigger pixels won't be faster to draw but a smaller amount is needed to "cover" the same area.
The tool is QCacheGrind with this trace and the github sources mentioned above:
http://singularcrew.hu/temp/callgrind.out.16726 |
| |
Oswald
Registered: Apr 2002 Posts: 5094 |
Quote: I did look at this document, and it looked like a lot of repeating data in the tables.
http://codebase64.org/doku.php?id=base:dots_and_plots
I do use tables, but combined with some calculation so I don't have to have such large tables with repeating data.
I will look at using a larger table to reduce some more of the calculations though
Cheers
thats the way it works, you trade memory for speed :) also compared to 64k a few 256 byte tables to speed up plotting is not much. |
| |
JackAsser
Registered: Jun 2002 Posts: 2014 |
Quote: First half was deleted as I was too slow ;)
What I wanted to add beyond that is that bigger pixels won't be faster to draw but a smaller amount is needed to "cover" the same area.
The tool is QCacheGrind with this trace and the github sources mentioned above:
http://singularcrew.hu/temp/callgrind.out.16726
Bigger pixels are faster to draw indeed. For example with 2x2 resolution you can fit both x position in a byte instead of a word. Also you mask and or the same byte on both lines with a byte difference. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Nice work on the profiling, Soci.
Interesting idea about doing less work within a char, oziphantom.
I was already thinking there may be merit in using a few charsets instead of a bitmap. Combining the two could work quite well; eg if every 32x64 pixel region of the screen was one page of a charset, then for some bullets it could be as long as 70 frames before a page crossing.
If each bullet computed at spawn time (and again at every subsequent page crossing) a lower bound on the number of frames until the next crossing, most of the overflow checks could be skipped altogether. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Assuming an event table is used to handle shunting bullets from unrolled speedcode to edge-detecting code (so they don't need to check their own timers), it's looking like eorplot/update position/eorplot would take around 76 cycles per bullet per frame.
That's just for a single hires or MCM pixel per bullet, too; add 26 cycles per bullet per frame for a second row of pixels. |
| |
Martin Piper
Registered: Nov 2007 Posts: 722 |
Nice idea for a bullet hell game :) |
| |
Style
Registered: Jun 2004 Posts: 498 |
Im wondering if, based on the moth movement and the bullet frequency, you couldnt generate a sprite with multiple bullets in it that hold the same pattern down the screen together..... if that makes sense. |
| |
pmprog Account closed
Registered: Nov 2005 Posts: 54 |
Things have gone a little wrong
Quoting Martin PiperNice idea for a bullet hell game :)
Thanks, but what really inspired me was Dragon Attack on the Amstrad
Quoting StyleIm wondering if, based on the moth movement and the bullet frequency, you couldnt generate a sprite with multiple bullets in it that hold the same pattern down the screen together..... if that makes sense.
Possibly, but the sprites are (will be) all already in use multiple times once I've got all the multiplexing in |
| |
pmprog Account closed
Registered: Nov 2005 Posts: 54 |
https://youtu.be/6n2zQIuYe1Y
So many bugs!!
Not quite sure why some lines move slower than others, and if I add the code that removes the bullet before redrawing, it'll only ever draw right down the centre of the screen! |
| |
rexbeng
Registered: Aug 2012 Posts: 37 |
Quoting ChristopherJam[...] the Amstrad version looks like it's only doing around 80 enemy bullets [...]
80 bullets? You should try a bit harder and get to the further levels :) |
| |
pmprog Account closed
Registered: Nov 2005 Posts: 54 |
I'm kind of getting angles now, but my maths code is still pretty poor. These videos are supposed to show an exploding circle.
https://youtu.be/hp8XVp5yeLc
The code before this looks pretty nice, it made a "star" explosion... still all wrong though.
I really can't wrap my head around why it's doing this, partly because I'm tired. Feel so close to just ditching this project and working on a simpler one... one that preferably doesn't need 8bit trigometry... |
| |
JackAsser
Registered: Jun 2002 Posts: 2014 |
That my friend is you not having proper 2 complements math. First of all, is your sinus table signed at all?! |
| |
pmprog Account closed
Registered: Nov 2005 Posts: 54 |
Yeah, my tables are signed. I've clearly got something wrong though |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
It looks like you've just toggled the high bit from the absolute value to get your negative numbers, rather than flipping all the bits and adding one (or alternately just subtracting from 256).
-5 isn't %10000101 ($85), it's %11111011 ($fb) |
| |
pmprog Account closed
Registered: Nov 2005 Posts: 54 |
I think you're right, but at the same time, I wrote a subtract_fraction routine, so it shouldn't be a problem, as it'll be subtracting the fraction amount, rather than adding the negative version.
I think I might start a new project that is just my math code and see if I can fix all the problems there, then bring it back into my game project. Should help speed up debugging |
| |
pmprog Account closed
Registered: Nov 2005 Posts: 54 |
I bizarrely have 3 angles which don't work properly, and there seems to be a couple of gaps, but otherwise, I think it's looking much better
https://youtu.be/dOPoIq3K8c8 |
| |
pmprog Account closed
Registered: Nov 2005 Posts: 54 |
Okay, so the 1st problem was that I had angle 180, where my range is 0-179. Then I also had some issues with sine/cosine of -1.0
Anyway, looks good, but slow
https://youtu.be/qWvrhBZiWlk |
| |
Bago Zonde
Registered: Dec 2010 Posts: 29 |
Well, the real bullet hell :D. In terms of gameplay, do you think you're going to avoid all that bullets during the gameplay? I would try to remove half of the bullets, and try to play with the speed. I'm thinking about the game itself here, as from tech perspective, would be great to have thousands of bullets moving around.
www.commocore.com |
| |
TWW
Registered: Jul 2009 Posts: 545 |
Two quick thoughts I had;
#1: How is the plotter set up, not using a shitload of jsr and stuff I hope. Also as it's <256 pixels, you should get away with a:
lda Y_Table_Hi,y
sta $fc
lda Y_Table_lo,y
sta $fb
ldy X_Table,x
lda BitMask,x
ora ($fb),y
sta ($fb),y
where X and Y contains the plott coordinates.
Second, the bullets seem to follow the path of an expanding circle. Could this be handled by a adapted Bresenham's circle algorithm (only drawing in two lower quadrants and limiting the number of pixels to match what you have on your youtube video)? |
| |
pmprog Account closed
Registered: Nov 2005 Posts: 54 |
I've put a new snapshot of the code up on GitHub. I've not converted it to use 64tass yet, but that will be done, then I can abuse macros to have "subroutines" written in seperated files, but drop the need for all the jsr/rts.
https://github.com/pmprog/TigerMothC64
Quoting Bago ZondeIn terms of gameplay, do you think you're going to avoid all that bullets during the gameplay? I would try to remove half of the bullets, and try to play with the speed.
I did think about cutting the number of bullets, but it really makes the play area sparse, meaning it's pretty easy to avoid everything.
I am thinking of making a strip down the left hand side to match the right hand side, so you play in a narrower corridor. This will also mean some bullets drop quicker, whilst still giving a bullet hell feel.
I think part of my problem will be making the game fun. I'm torn between trying to make the game fun, then seeing if I can fix performance, or seeing if I can get the performance before working on the "game". The problem is, if either of these aren't up to scratch, it's practically all wasted.
Quoting TWW
#1: How is the plotter set up, not using a shitload of jsr and stuff I hope. Also as it's <256 pixels, you should get away with a:
lda Y_Table_Hi,y
sta $fc
lda Y_Table_lo,y
sta $fb
ldy X_Table,x
lda BitMask,x
ora ($fb),y
sta ($fb),y
https://github.com/pmprog/TigerMothC64/blob/master/tools_bitmap..
My code is different in the sense that it both sets and clears, however I am completely unaware of the whole $fc and $fb thing. Looks handy, considering I rewrite two portions of code with the address I'm editing.
How does the $fb:$fc thing work? What's special about it?
Quoting TWW
Second, the bullets seem to follow the path of an expanding circle. Could this be handled by a adapted Bresenham's circle algorithm (only drawing in two lower quadrants and limiting the number of pixels to match what you have on your youtube video)?
Not all bullets will be in an expanding circle, that's just one firing pattern. There will be others.
Cheers |
| |
Oswald
Registered: Apr 2002 Posts: 5094 |
"How does the $fb:$fc thing work? What's special about it?"
it works by first setting the DST adress to the leftmost of the screen at the right 'height', then you use the Y register to offset the adress "horizontally" to the final desired one.
pro tip, you can save that adress in an unrolled loop for clearing.
(ldy # sta .. ldy # sta .. ldy # sta .. <- selfmod here the adress you write to, then just call all the sta's for clr)
if the bullet doesnt move vertically the whole zp pointer setup can be skipped.
anyhow max 256-512 dots can be moved like that per frame under demo conditions. so realistically in a game thats like 128 max imho. |
| |
JackAsser
Registered: Jun 2002 Posts: 2014 |
Quote: "How does the $fb:$fc thing work? What's special about it?"
it works by first setting the DST adress to the leftmost of the screen at the right 'height', then you use the Y register to offset the adress "horizontally" to the final desired one.
pro tip, you can save that adress in an unrolled loop for clearing.
(ldy # sta .. ldy # sta .. ldy # sta .. <- selfmod here the adress you write to, then just call all the sta's for clr)
if the bullet doesnt move vertically the whole zp pointer setup can be skipped.
anyhow max 256-512 dots can be moved like that per frame under demo conditions. so realistically in a game thats like 128 max imho.
Assuming 50fps which is at all not needed in a game. 25fps would do just fine. |
| |
TWW
Registered: Jul 2009 Posts: 545 |
Description about the plotting routine:
http://codebase64.org/doku.php?id=base:dots_and_plots&s[]=plott
EDIT: And yes, what Oswald say as clearing the entire bitmap screen (32 x 8 x 25 = 6400 bytes) vs. clearing only the plotts you have on the screen. |
| |
pmprog Account closed
Registered: Nov 2005 Posts: 54 |
Thanks for the info on zero page. I've also had an email from BiGFooT with some more stuff.
I apologise for turning this into a kind of developer log. Feel free to tell me to shut up until it's ready. Though I'm finding it quite helpful getting feedback as I'm working through it.
Here's a new video of the current version (The music is not playing from the game BTW, just on my audio player), plus a download of the PRG file, if you want to try it yourself. If you do, please could you provide feedback on whether you think it's responsive enough to make a playable game. Obviously there's still work to be done on optimising, and actual game play, but it's nice to get an idea of it's feel.
https://youtu.be/HHrJiZr4l1A
http://www.pmprog.co.uk/download/tigermoth.zip |
| |
TWW
Registered: Jul 2009 Posts: 545 |
As you say, still needs some more spunk.
I'd get rid of that "clearbitmapmemory" routine first of all if I were you and replace it with something like:
plott:
lda Lobyte
sta $fb
sta wiperoutine1+1
lda Hibyte
sta $fc
sta wiperoutine1+2
//plottit
rts
wiperoutine:
lda #$00
wiperoutine1:
sta $2000
wiperoutine2:
sta $2000
//....
rts
And no worries, wish more people would share their projects on something else than that damn facebook so here is good with me. |
| |
pmprog Account closed
Registered: Nov 2005 Posts: 54 |
Bah, it didn't save my reply. So I've got to rewrite it.
bitmap_clear is only called once at the start of a new game, so I'm not worried about that proc for now. Only things directly related to gameplay.
I really need to get the player firing bullets again, and actually implement bullet collisions, which I think is going to be the hardest part.
I was originally going to raster split two sprites multiple times down the right hand side for the score (that you can see in one of my older videos), but I'm considering just writing some 8x1 and 8x8 blit routines and use a mini "character set" to drop that dara in, as well as a health bar. That way I can save cycles reconfiguring sprites, and just update the bitmap memory when the score or health changes. |