| |
Radiant
Registered: Sep 2004 Posts: 639 |
Plaintext compression
As some of you know, I'm playing around a bit with creating a game...
Me being the person I am, I wanna include a not insubstantial amount of text in the game. A lot of text eats a lot of memory. Therefore I'm pondering different ways of compressing the text. Since the text will most likely only consist of A-Z, 0-9 and some miscellaneous delimiting characters, I figure I could easily do with just 64 different character codes, including some special codes for switching case, changing colour, etc. This would of course allow me to fit each character code into 6 bits, which is not immediately advantageable (at least to me), considering it would only result in a 1/4 memory usage reduction and add serious overhead to the print routine.
Another thing I could do would be to analyze my writing and create a table containing the most frequent words I use, and just replace all references to those with table indexes. This could save me quite a bit of RAM, and has the advantage of very little overhead... However, it feels a bit unreliable.
RLE wouldn't be effective.
So, which are your favourite methods for squeezing as much text as possible into, say, 4 KiB? :-) I'm really at a loss here, as you can see. |
|
| |
JackAsser
Registered: Jun 2002 Posts: 2014 |
The standard method of compressing plain text is to use huffman-trees. Basically a huffman tree creates a bitpattern for each symbol, where the most used symbol will get the smallest bitpattern. This typically yeilds a 2x-compression ratio, used together with your 6bits of input I'd say you'll end up with a 8x-compression ratio. Although as you say yourself, the print-routine will suffer substantial overhead, although I think a huffman decoder can be written quite efficiently. Search the web for algorithm details. |
| |
Oswald
Registered: Apr 2002 Posts: 5094 |
I think huffman with or without the 6bit encoding would give the same result, as huffman in general reduces the number of bits needed to encode bytes to the possible minimum.
anyway imho huffman with all the bit shifting and binary tree lookups is not really for the 6510, I advise lzw. |
| |
algorithm
Registered: May 2002 Posts: 705 |
Huffman does not give a good compression ratio for text. Neither does RLE. You can either use LZ based compression (should give compression ratio of around 3:1 - 5:1. (You can always use a compression utility such as Exomizer or Pucrunch to compress and decompress specific sections when required.
|
| |
Radiant
Registered: Sep 2004 Posts: 639 |
Yep, but I'd prefer to be able to decompress directly in the print routine, character for character which rules out Exomizer and the likes.
LZW has the same problem. Of course, I could just store all the text in compressed chunks, and then decompress it to a temporary buffer as needed, but I wonder if there'd be that much gain from doing that, considering the size of the buffer etc... Perhaps it'll be necessary, we'll see.
Any more suggestions? |
| |
algorithm
Registered: May 2002 Posts: 705 |
Maybe some simple dictionary based compression.
Each word would be analysed and replaced with a marker, this word would then be placed in the dictionary, the same occurance of the word would then point to the same area and new words would be added onto the dictionary.
This would obviously require additional space (for the dictionary) as well as the markers. If you are planning on having maybe a substantial amount of text, then this method may be the one to use (and its extremely fast.. just read the marker, it then points to lookuptable with the word
|
| |
Compyx
Registered: Jan 2005 Posts: 631 |
Maybe you should take a look at bzip2: http://en.wikipedia.org/wiki/Bzip2
It's a fairly simple algorithm and compresses text better than zip or gzip, and it's pattent-free unlike LZW. (See http://en.wikipedia.org/wiki/LZW#Patent_issues)
|
| |
Scout
Registered: Dec 2002 Posts: 1570 |
Quote: Yep, but I'd prefer to be able to decompress directly in the print routine, character for character which rules out Exomizer and the likes.
LZW has the same problem. Of course, I could just store all the text in compressed chunks, and then decompress it to a temporary buffer as needed, but I wonder if there'd be that much gain from doing that, considering the size of the buffer etc... Perhaps it'll be necessary, we'll see.
Any more suggestions?
That's not completely true.
You can still use Exomizer for this.
Check the sources for streamed decompression which are included in the package.
R.
---
-= Silicon Ltd. =-
http://forum.siliconlimited.com
Commodore 64 Scenemusic Podcast
http://8bitmayhem.blogspot.com/ |
| |
MagerValp
Registered: Dec 2001 Posts: 1078 |
LZSS is dead simple, and works well for text. No algorithm will work well for short strings though, so you should probably look into coding the strings as blocks.
|
| |
Radiant
Registered: Sep 2004 Posts: 639 |
Yeah, Burrows-Wheeler transform + RLE seems to be quite ok for text. A bit much overhead, though, as do LZ based algorithms provide.
Scout: But that's just streamed decompression, right? I'd probably need some kind of random access, though that is probably hard (impossible?) to obtain with any non-trivial compression... I'll check it out.
Anyway, thanks for the suggestions, I am a lot wiser now. |
| |
JackAsser
Registered: Jun 2002 Posts: 2014 |
With overhead do you mean computation time, memory space for depacker or lazyness for implementing, or a combo of all three? =D Just curious.. |
... 30 posts hidden. Click here to view all posts.... |
Previous - 1 | 2 | 3 | 4 - Next |