[CSDb] - User Forums - Plaintext compression

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > Plaintext compression

2005-11-28 01:33

Radiant

Registered: Sep 2004
Posts: 639

Plaintext compression

As some of you know, I'm playing around a bit with creating a game...

Me being the person I am, I wanna include a not insubstantial amount of text in the game. A lot of text eats a lot of memory. Therefore I'm pondering different ways of compressing the text. Since the text will most likely only consist of A-Z, 0-9 and some miscellaneous delimiting characters, I figure I could easily do with just 64 different character codes, including some special codes for switching case, changing colour, etc. This would of course allow me to fit each character code into 6 bits, which is not immediately advantageable (at least to me), considering it would only result in a 1/4 memory usage reduction and add serious overhead to the print routine.

Another thing I could do would be to analyze my writing and create a table containing the most frequent words I use, and just replace all references to those with table indexes. This could save me quite a bit of RAM, and has the advantage of very little overhead... However, it feels a bit unreliable.

RLE wouldn't be effective.

So, which are your favourite methods for squeezing as much text as possible into, say, 4 KiB? :-) I'm really at a loss here, as you can see.

2005-11-28 06:35

JackAsser

Registered: Jun 2002
Posts: 1990

The standard method of compressing plain text is to use huffman-trees. Basically a huffman tree creates a bitpattern for each symbol, where the most used symbol will get the smallest bitpattern. This typically yeilds a 2x-compression ratio, used together with your 6bits of input I'd say you'll end up with a 8x-compression ratio. Although as you say yourself, the print-routine will suffer substantial overhead, although I think a huffman decoder can be written quite efficiently. Search the web for algorithm details.

2005-11-28 09:02

Oswald

Registered: Apr 2002
Posts: 5025

I think huffman with or without the 6bit encoding would give the same result, as huffman in general reduces the number of bits needed to encode bytes to the possible minimum.

anyway imho huffman with all the bit shifting and binary tree lookups is not really for the 6510, I advise lzw.

2005-11-28 09:57

algorithm

Registered: May 2002
Posts: 702

Huffman does not give a good compression ratio for text. Neither does RLE. You can either use LZ based compression (should give compression ratio of around 3:1 - 5:1. (You can always use a compression utility such as Exomizer or Pucrunch to compress and decompress specific sections when required.

2005-11-28 10:33

Radiant

Registered: Sep 2004
Posts: 639

Yep, but I'd prefer to be able to decompress directly in the print routine, character for character which rules out Exomizer and the likes.

LZW has the same problem. Of course, I could just store all the text in compressed chunks, and then decompress it to a temporary buffer as needed, but I wonder if there'd be that much gain from doing that, considering the size of the buffer etc... Perhaps it'll be necessary, we'll see.

Any more suggestions?

2005-11-28 11:01

algorithm

Registered: May 2002
Posts: 702

Maybe some simple dictionary based compression.

Each word would be analysed and replaced with a marker, this word would then be placed in the dictionary, the same occurance of the word would then point to the same area and new words would be added onto the dictionary.

This would obviously require additional space (for the dictionary) as well as the markers. If you are planning on having maybe a substantial amount of text, then this method may be the one to use (and its extremely fast.. just read the marker, it then points to lookuptable with the word

2005-11-28 12:28

Compyx

Registered: Jan 2005
Posts: 631

Maybe you should take a look at bzip2: http://en.wikipedia.org/wiki/Bzip2

It's a fairly simple algorithm and compresses text better than zip or gzip, and it's pattent-free unlike LZW. (See http://en.wikipedia.org/wiki/LZW#Patent_issues)

2005-11-28 13:19

Scout

Registered: Dec 2002
Posts: 1568

Quote: Yep, but I'd prefer to be able to decompress directly in the print routine, character for character which rules out Exomizer and the likes.

LZW has the same problem. Of course, I could just store all the text in compressed chunks, and then decompress it to a temporary buffer as needed, but I wonder if there'd be that much gain from doing that, considering the size of the buffer etc... Perhaps it'll be necessary, we'll see.

Any more suggestions?

That's not completely true.

You can still use Exomizer for this.
Check the sources for streamed decompression which are included in the package.

R.

---
-= Silicon Ltd. =-
http://forum.siliconlimited.com

Commodore 64 Scenemusic Podcast
http://8bitmayhem.blogspot.com/

2005-11-28 13:44

MagerValp

Registered: Dec 2001
Posts: 1059

LZSS is dead simple, and works well for text. No algorithm will work well for short strings though, so you should probably look into coding the strings as blocks.

2005-11-28 14:48

Radiant

Registered: Sep 2004
Posts: 639

Yeah, Burrows-Wheeler transform + RLE seems to be quite ok for text. A bit much overhead, though, as do LZ based algorithms provide.

Scout: But that's just streamed decompression, right? I'd probably need some kind of random access, though that is probably hard (impossible?) to obtain with any non-trivial compression... I'll check it out.

Anyway, thanks for the suggestions, I am a lot wiser now.

2005-11-28 15:07

JackAsser

Registered: Jun 2002
Posts: 1990

With overhead do you mean computation time, memory space for depacker or lazyness for implementing, or a combo of all three? =D Just curious..

... 30 posts hidden. Click here to view all posts....

Previous - 1 | 2 | 3 | 4 - Next

Refresh

Subscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search for in
All times are CET.

Search CSDb

Advanced

Users Online

Jammer
nucleus/TempesT
Scrap/Genesis Project
blala
CreaMD/React
Frostbyte/Artline De..
Elder0010/G★P
Soya/Fairlight
DKT/Samar
Loloke
lA-sTYLe/Quantum
taper/ΤRIΛD
Airwolf/F4CG
serato/Finnish Gold
rambo/Therapy/ Resou..
Pad/G★P
Danko/Fairlight
niallquinn
Skyhawk/Triad
Faayd/Quantum
Dave/SIDNIFY
rexbeng
tonysavon
A3/AFL
Acidchild/Padua
katon/Lepsi De
Guests online: 152

Top Demos

1 Next Level  (9.8)
2 Mojo  (9.7)
3 Coma Light 13  (9.7)
4 Edge of Disgrace  (9.6)
5 Comaland 100%  (9.6)
6 No Bounds  (9.6)
7 Uncensored  (9.6)
8 Wonderland XIV  (9.6)
9 Memento Mori  (9.6)
10 Bromance  (9.5)

Top onefile Demos

1 It's More Fun to Com..  (9.7)
2 Party Elk 2  (9.7)
3 Cubic Dream  (9.6)
4 Copper Booze  (9.5)
5 TRSAC, Gabber & Pebe..  (9.5)
6 Rainbow Connection  (9.5)
7 Dawnfall V1.1  (9.5)
8 Quadrants  (9.5)
9 Daah, Those Acid Pil..  (9.5)
10 Birth of a Flower  (9.5)

Top Groups

1 Nostalgia  (9.3)
2 Oxyron  (9.3)
3 Booze Design  (9.3)
4 Censor Design  (9.3)
5 Crest  (9.3)

Top Diskmag Editors

1 Jazzcat  (9.4)
2 Magic  (9.4)
3 hedning  (9.2)
4 Elwix  (9.1)
5 A Life in Hell  (9.1)

Page generated in: 0.044 sec.