| |
oziphantom
Registered: Oct 2014 Posts: 490 |
modify Exomiser compressor to black list some memory locations
Does anybody know a way to modify the exomizer compression algorithm to black list FF0X memory locations to never be read from, i.e don't allow them to be used as part of a sequence?
I guess a post process would also work, if there is a simple way to convert use a sequence to literal bytes.. |
|
| |
Krill
Registered: Apr 2002 Posts: 2980 |
In order to avoid an XY problem here, what do you actually want to achieve? :) |
| |
oziphantom
Registered: Oct 2014 Posts: 490 |
Decompressing above FF00 on a 128.
I have the system modifed so it does a faster decompression that can copy the data to the other bank, allowing me to avoid any overlap issues, and just makes life easier on the whole. But FF00-5 are MMU and if you write to them it swaps the MMU config - this is not a problem as I have a method to write under the MMU. However there is still a problem when the SRC ptr tries to read from the MMU registers as the MMU also steals the reads, so it won't get the actual value it wants. If I can get the compressed data to only write to FF0X ( not 100% sure if its only 0-5 that has the issue of 0-8) then it will work fine. If not I will need to cut my input files into two X-FEFF and then a FFF8-FFFF and patch the bytes in the middle, a bit of a hassle and making the compressed file just only write and never ready from that range would be a lot simpler in the long run. As having VIC bank C000-FFFF is a really common thing 64 games do.. |
| |
Krill
Registered: Apr 2002 Posts: 2980 |
If you have modified the decompressor for faster operation on a C-128, can you use stack relocation for reading from any RAM location? It's 4 cycles per byte with built-in auto-increment... :) |
| |
Oswald
Registered: Apr 2002 Posts: 5094 |
why not just cut off the files at feff :P |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
nucrunch as is supports decrunching a set of segments with a single call. The ratio's not as good as exo mind.
Each segment's compressed separately at the moment (ie, doesn't refer back to substrings from earlier segments). Upsides and downsides for your use case.
Imma add that blacklist thing to the desiderata for my next cruncher (may be some time until release :P ) |
| |
oziphantom
Registered: Oct 2014 Posts: 490 |
converting rdecrunch to be 128 optimal/bank agnostic looks like it would be quite a challenge ;) exomizer's Y always decreases makes it easy ;) |
| |
oziphantom
Registered: Oct 2014 Posts: 490 |
Quote: If you have modified the decompressor for faster operation on a C-128, can you use stack relocation for reading from any RAM location? It's 4 cycles per byte with built-in auto-increment... :)
no,
a.) you will still need a counter to know when to inc the page byte, as the MMU doesn't extend the stack to a 16 pointer for you, so you would need to keep a ZP or register to detect the 0 overflow case.
b.) because I use the stack relocation to write the bytes. This lets me write to any bank, so I can read from bank 0, then write to bank 1 with it. It also always hits RAM, such that it will write under FF0X and IO while IO is banked in, so I don't even have to modify banking.
and this turns a sta (XX),y into a pha <- 3 clocks ;) and then 2mhz mode so 1.5..
The catch is the "read sequence" from data reads from uncompressed data and not compressed data, so I have to make the read more expensive. In that it needs a bank change, so I have to enable shared mem, so my code to read doesn't magic it self away ( one could just put the code into both places to avoid this however ) switch banks&disable IO(if I keep it on normally), good ole PCR registers to the rescue, lda (xx),y then switch the bank bank, PCR again, and disable shared memory. So the load eats 26 more clocks. I figure that since all the code has to be inlined (no stack operations are allowed in the decompressor ) and writing will happen for every byte, this works out a net win. Then 2mhz..
Since I don't have to worry about any overlap, I can now compress 2-ff00 (would be nice if it was 2-ffff), I can technically do 0- but those 2 bytes are kind of useless, and I don't have to move the data down at the start for "overlap safety" so I also pick up a massive boost of not having to shuffle any data around.
I also move the ZP into $1XX this allows me to put the variables and the 156 byte buffer into 'ZP' while leaving the actual ZP untouched, for even more speed.
now If I could just get it to not want to read from FF0X it would be perfect.. |
| |
Krill
Registered: Apr 2002 Posts: 2980 |
My loader has an option to allow loading under the IO space at $d000..$dfff.
It is implemented to favour speed over size: there are two getblock routines - one regular and one with all the slow bank switching in place. Depending on where the incoming block will go to, either the fast-RAM or the slow-IO incarnation is picked.
Could such a setup work for you? Technically, it is indeed possible to both read from and write to memory under MMU space at $ff0X, isn't it? |
| |
oziphantom
Registered: Oct 2014 Posts: 490 |
you can switch the stack to -1 src, pla, swap the stack back to dest pha to read/write under the ff0X block, but its a lot of overhead for 8bytes ;) You can't use the ZP relocation trick as you can't read 00/01 with it.
for a loader base solution, I write the exomizer file backwards to the disk, then the get byte just does
toggle clock for next byte
read byte from SSR
Bank switching on the 128 is not that bad, as you can set the Pre Config registers, so one has Bank 0 RAM + IO and 2 has Bank 0 RAM
then
STA $ff02
STA DEST
STA $ff01
only adds 8 clocks, really handy for the pesky IRQs as well, now it can just switch to its PRC and then just switch back to the "main threads" one, no more lda $1 pha lda #value sta $1 .... pla sta $1 :D or live on the edge with inc dec tricks
How does your loader handle banks? It would be nice to have a loader that handles Tass 3byte PRGs so I can load into both banks, just make one large file that will load from disk and unpack into both banks would be great.
However in this case I'm looking for RAM -> RAM based solutions, not needed for this particular case, I could just stream unpack from disk, but for future ideas I want to get it working. Under IO is not an issue, its reading from FF0X that is the issue. |
| |
Krill
Registered: Apr 2002 Posts: 2980 |
I was not suggesting any loader/IO-based solution, i was proposing to use the same approach as with loading under IO to solve your problem.
That is, use your normal sequence copy loop everywhere except for the problematic $ff0X range, where you use the overhead-ridden bank switching solution.
The overall performance impact should be minimal, but it eats a bit of memory to have two alternative routines for different memory ranges.
Quoting oziphantomHow does your loader handle banks? It would be nice to have a loader that handles Tass 3byte PRGs so I can load into both banks, just make one large file that will load from disk and unpack into both banks would be great. I have added native C-128 support just recently, mainly to implement burst support (and then see it's in fact not faster than the standard 2bit+ATN approach).
There is no support for generic loading across banks or 3-byte load addresses so far. The loader does not perform any bank-switching itself in the non-IO/non-KERNAL-fallback default variant, so it uses whatever bank it resides in by default, with or without common areas to load to the other bank. I have thought about adding something for full 128K support, but decided to ignore the problem for now, as i've only come up with cumbersome bank-switching thunk solutions so far, much like the OS does. |
| |
oziphantom
Registered: Oct 2014 Posts: 490 |
Exomizer by default has 16bit offsets, I guess I could force it into 256 offsets, then once I get below fe00 jump to another copy.. or change the get byte routing in ZP to point to different code...
At the moment it works in X128, but fails on hardware and Z64K so I'm trying to work out what the magic combo is... |
| |
Krill
Registered: Apr 2002 Posts: 2980 |
Limiting the offsets worsens the pack ratio, i'd try to avoid that.
You can add code to check if the sequence copy read range overlaps $ff00..$ff04 and then select the appropriate routine. Obviously the selection code should be highly optimised.
But if your trick to read RAM in that range only works in VICE and there is no way to achieve that on the real thing, this is all moot, of course. |
| |
Oswald
Registered: Apr 2002 Posts: 5094 |
just use different code for the whole ffxx page then just one cmp for src hi byte. |
| |
oziphantom
Registered: Oct 2014 Posts: 490 |
well fe and ff as y = ff and pointer is fe01 then lda (pointer),y still hits ff00. And you need to preserve C,V I think, and I can't use the stack... have to double check |
| |
tlr
Registered: Sep 2003 Posts: 1790 |
I think the original idea of modifying the compressor not to reference certain areas would be quite doable. I've considered that for subsizer but haven't really seen an actual use case for it.
If it is only reading you are concerned about, a modification to your favourite compressor's match algorithm should do it. e.g you could have an extra bit (or byte) per data byte that says if this may be included in a match or not. This will result in the whole match database not containing any possible references to that data, always generating output without it in the later steps.
If writing is a problem, you would need to add some way of doing skips in the output. |
| |
Krill
Registered: Apr 2002 Posts: 2980 |
Quoting oziphantomwell fe and ff as y = ff and pointer is fe01 then lda (pointer),y still hits ff00. And you need to preserve C,V I think, and I can't use the stack... have to double check I think it's most sensible to check the read range in flat memory space to decide on plain copy or $ff0X copy, then apply all those banking/stack relocation shenanigans to actually copy bytes around. :)
The decision should be made per sequence copy, not per source byte. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Quoting KrillThe decision should be made per sequence copy, not per source byte.
Sure, but it's still going to be slower than just blacklisting those particular source bytes. That's an extra check for every token - particularly harsh given that exo also uses copies for recently used single bytes. |
| |
Krill
Registered: Apr 2002 Posts: 2980 |
Quoting ChristopherJamQuoting KrillThe decision should be made per sequence copy, not per source byte. Sure, but it's still going to be slower than just blacklisting those particular source bytes. That's an extra check for every token - particularly harsh given that exo also uses copies for recently used single bytes. Yes, disallowing certain memory ranges on the compressor side is the preferred option, IF it is available. :)
Quoting ChristopherJamThat's an extra check for every token - particularly harsh given that exo also uses copies for recently used single bytes. An extra check for every sequence-copy token. However, it should be highly optimisable. The check only needs to be performed once the problematic range has actually been written to, and as its high-byte is $ff, the back-reference check in flat memory space should allow for early exit. There may be more opportunities for optimisation. |
| |
oziphantom
Registered: Oct 2014 Posts: 490 |
not write, read
so if when you are writing to $4000 and it wants to copy a 128 bytes sequence and that sequence starts at fE88, then the first 8 reads(as it reads from the top down) need to be the special read under FF0X code. So in order to know if it needs normal, or special, you need to do
(Start + X + Len).hi > ff where start and len are 16bits 'then do special' is probably the best case. It might be faster overall to just do Start.hi + Len.hi > fd and take the hit rather than take the hit of 16bits for the rest.
ChristopherJam is right Exomizer loves to do a sequence of 1 byte.
I have written Magnus Lind, and he sees that it might be useful for other systems as well, and if its not too much work he is happy to make "black list intervals" a feature of exomizer. |
| |
Krill
Registered: Apr 2002 Posts: 2980 |
Quoting oziphantomnot write, read A sequence cannot be read from before it has been written initially (and writing is not a problem, if i have understood you right). The write pointer is strictly ascending or descending depending on depack direction, and thus the range check is superfluous before the problematic range has been written to. This is why i wrote "The check only needs to be performed once the problematic range has actually been written to". |
| |
oziphantom
Registered: Oct 2014 Posts: 490 |
ok I see what you are saying, do the forward decompress not the backwards decompress, this then means I only have to do the slow method for 255 bytes tops |
| |
Krill
Registered: Apr 2002 Posts: 2980 |
No, what i said should apply to either depack direction.
I'm not quite sure which direction would give more optimisation opportunities for the $ff0X range check at the moment, but both would probably have to do with the difference of write pointer vs back-reference read pointer crossing the 64K bank boundary or not.
But if you intend to depack while loading, forward decompression is the way to go. |
| |
oziphantom
Registered: Oct 2014 Posts: 490 |
wait that won't work, to get PHA one must go backwards.
Since its FF0X going backwards(assuming you start above it, and if you don't just use a version that skips the check altogether) gives you 248 bytes max that won't need the check garanteed.
If you go forward then you only have 248 bytes where one must check for FF0X however you can't use PHA to write.. |
| |
oziphantom
Registered: Oct 2014 Posts: 490 |
But if you intend to depack while loading, forward decompression is the way to go.
Why is forward better from loading? (apart from it saves you flipping the file ) |
| |
Krill
Registered: Apr 2002 Posts: 2980 |
Okay, then backward decompression is a given, so any potential performance differences to forward decompression are moot.
Forward decompression is usually suited better for decompression while loading mainly because loading itself is usually performed in the forward direction. You can then decompress in-place* in the same direction. That should work for backward compression as well, given that loading is done in the same direction as well.
* Read buffer (loaded compressed file) is a subset of the write buffer (decompressed file), both end at the same address using forward direction. For Exomizer, there are a few (3-ish) compressed bytes beyond the uncompressed data. |
| |
oziphantom
Registered: Oct 2014 Posts: 490 |
Its in, and it works :D |