| |
TWW
Registered: Jul 2009 Posts: 545 |
The big VICE & SuperCPU Thread
I open this thread so coders may share information and issues regarding the Vice SuperCPU Emulator.
I found 2 issues;
The TCD command doesen't work. PHA/PLD works fine. Tried the XBA tp make sure there was no funny business with the C register beeing mixed up but same result.
I did a wipe-mem routine which clears memory. Obviously a 16 bit STZ DP is the way to go and just relocating the ZP for each page you wipe. However a dumb 16 bit STZ Abs,x (2 cycles more/instruction) is faster then a loop with roughly 20 cycles overhead. The math doesen't add up as the DP aproach consumes less cycles acc. to ref. material. Can it be the RAM refresh and branching which causes aditional wait times (I read somewhere that a RAM Refresh takes 8 cycles) which causes the deviation? |
|
| |
blacky
Registered: Sep 2007 Posts: 41 |
What revision are you using ?? |
| |
TWW
Registered: Jul 2009 Posts: 545 |
Where do I see the exact revision number (I only see V2.4)?
The files are dated from 27/12-2012 and the zip named: WinVICE-2.4-x86+(scpu64).7z
|
| |
TWW
Registered: Jul 2009 Posts: 545 |
Anybody had any luck in getting the IDE64 to work with the SuperCPU Vice? Tried to compile a FW early 2.1 but can't get the HD to load anything. It works in regular vice X64 though. |
| |
blacky
Registered: Sep 2007 Posts: 41 |
Please try the latest nighly build at http://vice.pokefinder.org/
|
| |
TWW
Registered: Jul 2009 Posts: 545 |
Tried it.
#1: The stz ZP vs. stz abs,x timing doesen't seem to have improved (unless there is another cause for it). I can post some example code later to show this.
#2: TCD instead of PHA/PLD (16 bit mode) still messes things up in my code atleast
#3: a little better with the IDE64 part but still unreliable and vice even refuses to start up with a IDEROM and HD attached.
NewBug: Previed of FDs (1541/1581) doesen't work in latest version (-r27250)
|
| |
chatGPZ
Registered: Dec 2001 Posts: 11386 |
hint: if you actually want to help with fixing it, the bug tracker is where you should post the bugs, not here =P |
| |
TWW
Registered: Jul 2009 Posts: 545 |
I don't trust my own observational skills enough to determine if it is a bug or not.
Sure I'd like to help fix things based on what I find but I'd like a chance to discuss the bugs before reporting them to verify that we are, indeed, looking at a bug and not my crossed eyes (Has happened before you know (My wife blames Jack Daniels)).
ps. Mouse don't work either 8-D Really mess it up both on SCPU and 64SC. Same code on my side and brutal crash in vice once it's ALT-Q'ed in. |
| |
TWW
Registered: Jul 2009 Posts: 545 |
Alright.
The "TCD" does work fine. Stupid mistake by me (see Gpz^^).
Regarding the mem-filler, I still have the same situation. So I'm gonna attach some code this time in case someone knows what this is about (All in native, all regs in 16 bit mode):
Straight Indexed STZ routine (Called with bitmap memory in X):
.for(var i=0;i<8000;i=i+2) {
:STZ $0000+i,x
}
rts
And here is the one using relocatable Direct Page (ZP) (Called with bmpmem in A):
:LDX #$001f // 32 pages needs to be cleared
!: :TCD
.for(var i=0;i<128;i++) {
:STZ $00+[i*2]
}
clc
:ADC #$0100
dex
bmi *+5
jmp !- // Repeat
:LDA #$0000
:TCD
rts
Ther result is roughly 4 lines of rastertime more on the Direct Page approach. So unless the direct page aproach (the overhead/loop shouldn't give more cycles than saved on the direct page addressing) is causing more cycles to be eaten there is something amiss in my code or the emulator. |
| |
MagerValp
Registered: Dec 2001 Posts: 1078 |
You can't optimize SuperCPU code like that. To write to the graphics bank you need to have memory mirroring turned on, which leaves you with one write every 1 MHz cycle.
The first takes exactly 8000 cycles to execute, since what's how many bytes you clear. The second clears 8192 bytes so it'll take a smidge over 3 rasterlines extra.
Just keep the code simple instead:
ldy #8000
: stz $0000,x
inx
dey
bne :-
Now you have 8 "free" cycles left in the loop where it's just stalling and waiting for the next available write cycle. |
| |
TWW
Registered: Jul 2009 Posts: 545 |
Offcourse... Max 1 write / cycle to the 64k base ram... Explains everything! Thanx.
However it only takes 4000 writes since it's in 16 bit mode and you would have to increase the x-register twice and set y to #4000. Tested and works like a charm.
Edit: Too bad though. Would have been cool if it worked ;-) |
| |
Oswald
Registered: Apr 2002 Posts: 5094 |
you can always calculate some other stuff while the write to the c64 ram happens :) Now that there's SCPU emulation I whish I had a lot of free time to create a demo for it, or atleast experiment with some code.
for example a texture mapper, I guess the bottleneck would be rather the GFX mode and not the speed. At 20 mhz 16 bit registers and in 160x200 and a nicely optimized routine, AGA quality texture mapped objects should be possible. But what gfx mode could display 16 colors nicely ? ;) Or maybe 4 colors would work with heavy ordered dithering. |
| |
MagerValp
Registered: Dec 2001 Posts: 1078 |
Texture mapped 3D objects are boring on the PC, boring on the Amiga, and they will be just as boring on the SuperCPU :)
But you have the ability to write to the VIC every cycle so I bet you can do some wicked crazy stuff there. Vicious SID style audio effects are virtually "free" too, you'd only lose 60 out of the 1260 or so cycles you have each line... |
| |
chatGPZ
Registered: Dec 2001 Posts: 11386 |
Quote:and they will be just as boring on the SuperCPU :)
you can rotate them 20 times faster -> part is over much earlier -> less boredom =) |
| |
Oswald
Registered: Apr 2002 Posts: 5094 |
Quote: Texture mapped 3D objects are boring on the PC, boring on the Amiga, and they will be just as boring on the SuperCPU :)
But you have the ability to write to the VIC every cycle so I bet you can do some wicked crazy stuff there. Vicious SID style audio effects are virtually "free" too, you'd only lose 60 out of the 1260 or so cycles you have each line...
mayve an allborder flat shaded inconvex poly at 25 or 50 fps, would please both of us ;) |
| |
Trash
Registered: Jan 2002 Posts: 122 |
Quote: Texture mapped 3D objects are boring on the PC, boring on the Amiga, and they will be just as boring on the SuperCPU :)
But you have the ability to write to the VIC every cycle so I bet you can do some wicked crazy stuff there. Vicious SID style audio effects are virtually "free" too, you'd only lose 60 out of the 1260 or so cycles you have each line...
You could do everysecond line FLI with new D800-colors each fli-line with sprite-underlay and splits on the spritecolors that would give you 4x2 pixels with any color of your choice, right? |
| |
Oswald
Registered: Apr 2002 Posts: 5094 |
Quote: You could do everysecond line FLI with new D800-colors each fli-line with sprite-underlay and splits on the spritecolors that would give you 4x2 pixels with any color of your choice, right?
now we're talking :) tho probably there would be no time to split the sprite colors, because the VIC will need the free cycles bcoz of the badline and sprites. |
| |
Trash
Registered: Jan 2002 Posts: 122 |
True, but three colors + backcolor + a fourth fixed spritecolor per 4x2 would be sufficient for most needs...
EDIT:
Now when I think about it, three colors + background per 4x2 would be doable with no more than a REU... |
| |
Count Zero
Registered: Jan 2003 Posts: 1932 |
Indeed there is a lot of trickery possible on a SuperCPU. Double buffering stuff and switching optimization modes, writing to 2 registers at once or having the zeropage at $d000 to speed up register writes. I am really curious as to what demostuff people come up with now and who finds the best use for too many cycles _left_ :)
|
| |
Ninja
Registered: Jan 2002 Posts: 411 |
Wow, I made them more than a dozen of years ago, still you might find my articles about SCPU timings interesting:
http://the-dreams.de/articles/scpu-superram.txt
http://the-dreams.de/articles/scpu-badlines.txt |
| |
chatGPZ
Registered: Dec 2001 Posts: 11386 |
i like how the existance of supercpu demos only depends on the availability of an emulator.... reminds me of what happened when reu was finally emulated correctly =) |
| |
enthusi
Registered: May 2004 Posts: 677 |
I shall write a demo that utilizes warp-mode and RAM-injection of vice. |
| |
AmiDog
Registered: Mar 2003 Posts: 97 |
The SuperCPU64 has 128KB of SRAM and the SuperCPU128 (which also works on the C64) has 256KB. Bank $01 is described as "PseudoROM, RAM" and should basically contain a copy of the C64 ROMs for performance reasons.
I remember trying to use bank $01 for some timing sensitive code back in 2005/2006 or so, but it failed for some reason. Does anyone know if bank $01 is write-protected somehow, or if the SuperCPU does treat bank $01 in some special way making it hard/impossible to use for custom code? Perhaps parts of bank $01 can be used?
Since I wasn't using any ROM routines, I kind of assumed I should be able to use bank $01 for my own code, since the SuperCPU really only needs to remap bank $00 ROM accesses to bank $01 ones and shouldn't need to mess with direct bank $01 accesses at all. |