[CSDb] - User Forums - Plaintext compression

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > Plaintext compression

2005-11-28 01:33

Radiant

Registered: Sep 2004
Posts: 639

Plaintext compression

As some of you know, I'm playing around a bit with creating a game...

Me being the person I am, I wanna include a not insubstantial amount of text in the game. A lot of text eats a lot of memory. Therefore I'm pondering different ways of compressing the text. Since the text will most likely only consist of A-Z, 0-9 and some miscellaneous delimiting characters, I figure I could easily do with just 64 different character codes, including some special codes for switching case, changing colour, etc. This would of course allow me to fit each character code into 6 bits, which is not immediately advantageable (at least to me), considering it would only result in a 1/4 memory usage reduction and add serious overhead to the print routine.

Another thing I could do would be to analyze my writing and create a table containing the most frequent words I use, and just replace all references to those with table indexes. This could save me quite a bit of RAM, and has the advantage of very little overhead... However, it feels a bit unreliable.

RLE wouldn't be effective.

So, which are your favourite methods for squeezing as much text as possible into, say, 4 KiB? :-) I'm really at a loss here, as you can see.

... 30 posts hidden. Click here to view all posts....

2005-11-28 15:13

Radiant

Registered: Sep 2004
Posts: 639

Mainly memory wastage and computational time, actually. :-) Though the laziness factor is not to be underestimated...

2005-11-28 15:36

Scout

Registered: Dec 2002
Posts: 1570

Quote:

Scout: But that's just streamed decompression, right? I'd probably need some kind of random access, though that is probably hard (impossible?) to obtain with any non-trivial compression... I'll check it out.

Well, you can compress your texts in segments.
In your program you set the decompress-pointers to the segment (in memory) of your choice and stream-decompress it.
The only thing you have to do is make a table with all the pointers of your compressed text segments and you'll be able to do random access them.

R.
---
-= Silicon Ltd. =-
http://forum.siliconlimited.com

Commodore 64 Scenemusic Podcast
http://8bitmayhem.blogspot.com/

2005-11-28 16:35

Radiant

Registered: Sep 2004
Posts: 639

Scout: Yeah, I figured that out after posting my initial reply. :-) I blame lack of sleep...

I'm probably gonna go with whatever method that saves me the most space, including the depacking algorithm. If I don't have to code the depacker myself that's a bonus I'll consider as well. :-)

2005-11-29 16:21

Ben
Account closed

Registered: Feb 2003
Posts: 163

Hmzz.. I did want to put some of my knowledge on display.. but.. you do not need random access, quite sure, so why not dynamically update the range of texts with one of those good loaders around.. Rid yourself from all kinds of selfimposed constraints.. All you need is a couple of look-up tables..

2005-11-29 16:32

Monte Carlos

Registered: Jun 2004
Posts: 359

Perhaps use a dictionary and then compress the
dictionary.

2005-11-29 17:10

algorithm

Registered: May 2002
Posts: 705

Personally I would go for the LZ method. It is very easy to implement in the code. eg the required section of text is pointed to the compressed text, destination memory location is input and the decompressor then run.

2005-11-29 18:19

TNT
Account closed

Registered: Oct 2004
Posts: 189

Problem with LZ is that you need plaintext history, so you need to have quite a big buffer for it to be effective. LZW avoids that, but its data tree easily takes the same amount of memory.

I would use combination of dictionary and Huffman, that allows random access without extensive memory usage.

2005-11-29 19:42

JackAsser

Registered: Jun 2002
Posts: 2014

Huffman in not randomly accessible unless you put resync markers every now and then. You can never be sure that you've synchronized correctly on to the bitstream. Each random combination of bits will produce non-verifiable results on the other end. I should add that this is as I remebers it from school at least to reduce the risk of public flaming etc. =)

2005-11-30 10:54

Graham
Account closed

Registered: Dec 2002
Posts: 990

huffman is not a very efficient compression algo, it is just a "clean" algo so the profs at the universities love it :)

hack-algos like the LZ-clones perform much much better than huffman, especially on text. LZW also gives good compression results, but needs way too much memory so it cannot be used on C64.

2005-11-30 11:14

Steppe

Registered: Jan 2002
Posts: 1510

Outsourcing the text to sequential files and loading them with an IRQ loader when needed is not an option, I guess?

Previous - 1 | 2 | 3 | 4 - Next

Refresh

Subscribe to this thread: