Log inRegister an accountBrowse CSDbHelp & documentationFacts & StatisticsThe forumsAvailable RSS-feeds on CSDbSupport CSDb Commodore 64 Scene Database
You are not logged in - nap
CSDb User Forums


Forums > C64 Coding > NTSC C128 running PAL C64 Demos?
2018-08-25 00:42
alterus

Registered: Feb 2016
Posts: 10
NTSC C128 running PAL C64 Demos?

I was wondering, would it be possible to port PAL C64 demos to the NTSC C128? In 40 column mode, the C128 can provide 2 MHz in the boarders. I'm thinking this might be enough to get many PAL demos working in NTSC land. I'm far from an expert in this area though. What do you guys think?
 
... 3 posts hidden. Click here to view all posts....
 
2018-08-25 20:36
AlexC

Registered: Jan 2008
Posts: 299
Great discussion. Great ideas. MMU with REU and twice the memory out-of-the-box comparing to c64 can lead to some very interesting results. Maybe this will finally lead to exploiting C128 features by the scene ;)
2018-08-25 21:57
Compyx

Registered: Jan 2005
Posts: 631
I wouldn't count on it.
2018-08-26 08:06
oziphantom

Registered: Oct 2014
Posts: 490
Well 128s don't have REU built in, so not "standard" issue.

You also have a BURST load, and you can probably say 1571 required, to which you get another 2mhz CPU which you can burst transfer data. For faster loading and streaming.

However I plan to make 128 games from now on, as the Crackers won't touch them ;)
2018-08-26 08:57
Krill

Registered: Apr 2002
Posts: 2980
Quoting AlexC
Maybe this will finally lead to exploiting C128 features by the scene ;)
Quoting Compyx
I wouldn't count on it.
Yous make it sound as if there is not a single demo using the C128's capabilities. But actually, a C128 demo has just recently won a mixed-platform demo compo: Come to Deadline :) Not to speak of a few other C128 productions before it.

Quoting oziphantom
You also have a BURST load, and you can probably say 1571 required, to which you get another 2mhz CPU which you can burst transfer data. For faster loading and streaming.
As hardware bitshifting is limited to 1/4 the PHI2 clock rate, a byte takes at least 32 cycles to transfer. Which is the same limit as with using software bitbanging, so "BURST" isn't faster with the same clock speed.

However, it might be possible to interleave some other code into the "BURST" transfer code, similar to interleaving raster code and calculation, thus saving the cycles elsewhere.
2018-08-26 09:50
oziphantom

Registered: Oct 2014
Posts: 490
well true.. but said 32 clocks don't take any 128 CPU or Drive CPU, so on the drive you can GCR decode(given the Drive has 64 clocks to play with you could probably even deflate on the drive) and get the next byte ready while the hardware transfers the byte. So you can send a byte in 32 clocks, and then send one instantly again. To which the 128 has a 32 clock window to grab it in(sadly a 1Mhz NMI handler would take longer than this, but on 2Mhz you could use the NMI - I think not actually worked out all clock variants). So you can always transfer. Also you can Burst transfer and bit bang at the same time. Or you can use the fact ,burst protocol is async so you can stream in data when ever you are ready and have the drive wait, giving you zero timing constraints with faster transfer, and minimal 128 CPU time impact. I really want to exploit this in a game, I think it will make a streaming world possible. Or say in SAMs when you "switch outfits" you could easily stream in the next sprites over the frames without having to pause.

I made 128 versions of some games. So the amount of time it takes to install your loader on the C64 version, with on the fly exomiser decompression, vs the 128 auto boot with stock Burst transfer and on the fly exomiser(it might also help the the 128 loader can handle an exomiser with literals). They worked out about the same. Yes your loader beats the stock slow/bad BURST implementation, but the time it doesn't have to spend loading and installing the loader makes up for Commodores lame programming.
2018-08-26 15:08
Krill

Registered: Apr 2002
Posts: 2980
Quote:
said 32 clocks don't take any 128 CPU or Drive CPU
Not quite, at least writing the next byte into the latch and reading a byte on the other side plus storing it require a few cycles as well.

Quote:
so on the drive you can GCR decode(given the Drive has 64 clocks to play with you could probably even deflate on the drive) and get the next byte ready while the hardware transfers the byte. So you can send a byte in 32 clocks, and then send one instantly again.
Sending data during fetch+decode might be possible, especially by using the optimised GCR read+decode+checksumming approach which i haven't ported from 1541 to 1571 yet. It could do that already, thanks to 2 MHz mode and some suitable tables in ROM, but alas, without spare cycles to speak of. Sending data while reading from disk would not work with an IRQ loader, but at least the screen could remain enabled while loading.

Decompression on the drive side would likely have more drawbacks than gain. There's no RAM to store a dictionary or previously-depacked data, and the serial transfer will remain the bottleneck even with BURST.

Quote:
To which the 128 has a 32 clock window to grab it in(sadly a 1Mhz NMI handler would take longer than this, but on 2Mhz you could use the NMI - I think not actually worked out all clock variants).
NMI is probably prohibitive either way. But polling would work and leave spare cycles for some interleaved code doing other stuff.

Quote:
Also you can Burst transfer and bit bang at the same time.
That's an interesting idea! :)

Quote:
but the time it doesn't have to spend loading and installing the loader
Hmm, i've mostly regarded the install prodedure as a fire&forget action, so never really benchmarked that. I've had some plans to let the drive fetch its custom loader code directly from disk, rather than the roundabout way of loading a file into computer memory and then uploading the custom drive code to the drive. But i scrapped those plans for the sake of easier usage.
2018-08-26 16:24
oziphantom

Registered: Oct 2014
Posts: 490
Quoting Krill
Quote:
said 32 clocks don't take any 128 CPU or Drive CPU
Not quite, at least writing the next byte into the latch and reading a byte on the other side plus storing it require a few cycles as well.
well yes, but those 32 that the serial shift takes don't need anything. you need to do a STA FIXED ADDRESS on the drive and a LDA FIXED ADDRESS on the 128 side, but the 32 in-between are still CPU free for other things like setting up the next byte. VS bit banging where all 32 clocks are eaten on both 128 and Drive.

Quoting Krill
Quote:
so on the drive you can GCR decode(given the Drive has 64 clocks to play with you could probably even deflate on the drive) and get the next byte ready while the hardware transfers the byte. So you can send a byte in 32 clocks, and then send one instantly again.
Sending data during fetch+decode might be possible, especially by using the optimised GCR read+decode+checksumming approach which i haven't ported from 1541 to 1571 yet. It could do that already, thanks to 2 MHz mode and some suitable tables in ROM, but alas, without spare cycles to speak of. Sending data while reading from disk would not work with an IRQ loader, but at least the screen could remain enabled while loading.

Well does the GCR read + decode + checksumming take 64 or more clocks ( given we are running at 2mhz not 1 mhz), if so or about that much, instead of STA RAM,x inx can you not just STA CIA REG ??

Quote:
Decompression on the drive side would likely have more drawbacks than gain. There's no RAM to store a dictionary or previously-depacked data, and the serial transfer will remain the bottleneck even with BURST.
true, but if you just want to stream data, it gets your more disk space ( which the 1571 already has ) but lets you drop data into the 128s RAM for less CPU load on its side. I mean even if it was just an RLE of level data or something, still gets you more disk space with zero effort on the 128 side, so your game gets more CPU time.

Quoting Krill
Quote:
To which the 128 has a 32 clock window to grab it in(sadly a 1Mhz NMI handler would take longer than this, but on 2Mhz you could use the NMI - I think not actually worked out all clock variants).
NMI is probably prohibitive either way. But polling would work and leave spare cycles for some interleaved code doing other stuff.

Quote:
Also you can Burst transfer and bit bang at the same time.
That's an interesting idea! :)

Quote:
but the time it doesn't have to spend loading and installing the loader
Hmm, i've mostly regarded the install prodedure as a fire&forget action, so never really benchmarked that. I've had some plans to let the drive fetch its custom loader code directly from disk, rather than the roundabout way of loading a file into computer memory and then uploading the custom drive code to the drive. But i scrapped those plans for the sake of easier usage.

Well I'm loading a one shot game, such as Dropzone and Kung Fu Master, so I only need to load once.
2018-08-26 19:09
Krill

Registered: Apr 2002
Posts: 2980
Quoting oziphantom
VS bit banging where all 32 clocks are eaten on both 128 and Drive.
The 32 cycles include storing on the computer side (drive side analogous):
4 lda $dd00
2 lsr
2 lsr
4 eor $dd00
2 lsr
2 lsr
4 eor $dd00
2 lsr
2 lsr
4 eor $dd00
4 sta somewhere
Of course, resyncing once every 4 or 8 bytes adds a few extra cycles.

Quoting oziphantom
Well does the GCR read + decode + checksumming take 64 or more clocks ( given we are running at 2mhz not 1 mhz), if so or about that much, instead of STA RAM,x inx can you not just STA CIA REG ??
Thinking about it, at 1 MHz, it's 127 / 4 = 31.75 cycles minimum on average, although distributed somewhat unevenly. But yeah, might work for the old approach as well, at 2 MHz.

Quoting oziphantom
I mean even if it was just an RLE of level data or something, still gets you more disk space with zero effort on the 128 side, so your game gets more CPU time.
That's true, something simple like that would work.

Quoting oziphantom
Well I'm loading a one shot game, such as Dropzone and Kung Fu Master, so I only need to load once.
For that, it might actually pay off to use NMI at 2 MHz to load in compressed data, and decompress in the mainline thread while the bytes roll in. Obviously with some highly-optimised NMI+depacker scheme, minimising the context-switch overhead.
2018-08-28 09:04
oziphantom

Registered: Oct 2014
Posts: 490
If you can do on the fly GCR decode without the fancy tricks at 2Mhz ( the 1571 internals book claims you can ) that would free up a lot more RAM right?
Or if you have the speed and you can dump data directly down the SSR, could one use the data as Huffman and hence you can deflate Huffman code off the disk?
If only you could read from both heads at the same time ;)

In my loading case it has a bitmap and music playing, with the 128 version it can keep them going right till the end. On the 64 version it need to load in all the data. Remove the bitmap, music and then deflate. The disk is here http://cloud.cbm8bit.com/oziphantom/autoboot_dropzone.d64 use Boot or let autoboot load the 128 version. LOAD "*",8,1 to load the C64 version.
2018-08-29 11:10
Krill

Registered: Apr 2002
Posts: 2980
Quoting oziphantom
If you can do on the fly GCR decode without the fancy tricks at 2Mhz ( the 1571 internals book claims you can ) that would free up a lot more RAM right?
I'm currently using the non-fancy version, which probably uses less RAM than the new approach used on 1541. There would be a few opportunities left to save some RAM, like re-using the block-fetch code for the block headers, and also throwing out the entire dir-parsing and dir-buffer stuff, which isn't needed for a single-load production. I haven't spent much time optimising the 1571 drive code.

Quoting oziphantom
IOr if you have the speed and you can dump data directly down the SSR, could one use the data as Huffman and hence you can deflate Huffman code off the disk?
I think that would only be possible with having the Huffman decoder on the computer side, as the transfer is tightly coupled to reading from disk, and post-Huffman transfer would have a variable bit-rate, so to speak. When doing the decoding on the computer side, well, could use some other encoding as well, one that hits the sweet spot for raw loading/transfer speed vs. pack ratio vs. decoding complexity, ultimately minimising loading time.

Quoting oziphantom
In my loading case it has a bitmap and music playing, with the 128 version it can keep them going right till the end. On the 64 version it need to load in all the data. Remove the bitmap, music and then deflate. The disk is here http://cloud.cbm8bit.com/oziphantom/autoboot_dropzone.d64 use Boot or let autoboot load the 128 version. LOAD "*",8,1 to load the C64 version.
This seems to be using an old loader version. The new one should be quite a bit faster, on C64/1541, anyways. :)
Previous - 1 | 2 - Next
RefreshSubscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search   for   in  
All times are CET.
Search CSDb
Advanced
Users Online
Shogoon/Elysium/MSL
Knight Rider/TREX
lotus_skylight
Quetzal/Chrome
TBH
Guests online: 88
Top Demos
1 Next Level  (9.7)
2 13:37  (9.7)
3 Mojo  (9.7)
4 Coma Light 13  (9.6)
5 Edge of Disgrace  (9.6)
6 What Is The Matrix 2  (9.6)
7 The Demo Coder  (9.6)
8 Uncensored  (9.6)
9 Comaland 100%  (9.6)
10 Wonderland XIV  (9.6)
Top onefile Demos
1 Layers  (9.6)
2 No Listen  (9.6)
3 Party Elk 2  (9.6)
4 Cubic Dream  (9.6)
5 Copper Booze  (9.6)
6 Rainbow Connection  (9.5)
7 Dawnfall V1.1  (9.5)
8 Onscreen 5k  (9.5)
9 Morph  (9.5)
10 Libertongo  (9.5)
Top Groups
1 Performers  (9.3)
2 Booze Design  (9.3)
3 Oxyron  (9.3)
4 Triad  (9.3)
5 Censor Design  (9.3)
Top Musicians
1 Rob Hubbard  (9.7)
2 Mutetus  (9.7)
3 Jeroen Tel  (9.7)
4 Linus  (9.6)
5 Stinsen  (9.6)

Home - Disclaimer
Copyright © No Name 2001-2024
Page generated in: 0.13 sec.