| |
lft
Registered: Jul 2007 Posts: 369 |
Bitstream decoding
I wrote a little article about fast bitstream decoding over at codebase. Enjoy! |
|
| |
Oswald
Registered: Apr 2002 Posts: 5094 |
just rushed trough it, but found some nifty nice routine at the end, and a well written article, well done :) |
| |
Mr. SID
Registered: Jan 2003 Posts: 424 |
Nice and elegant. Very good read! |
| |
tlr
Registered: Sep 2003 Posts: 1790 |
Some very nice tricks in there. Thanks for sharing! |
| |
TNT Account closed
Registered: Oct 2004 Posts: 189 |
One more trick mentioned here: you get free OR operation by loading A with a value with multible bits set :) |
| |
Flavioweb
Registered: Nov 2011 Posts: 463 |
Nice! Thanks a lot! |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
The routine is nice, but gets unhandy when the number of bits to be fetched is determinded while fetching, consider for e.g. the elias gamma encoding as being used to encode the lengtths in doynax/byteboozer?
One would need to shift in from the right side ones for building up a counter and then shift them out again to the left when fetching the bits to be fetched for a number. So one would need at least the ability to change the direction of the shifter.
I counted some cycles and it is just as fast as a jsr to get_byte in case, but of course it reduces code. But there's overhead after each fetch with the table lookup. Also when using x as lowbyte and index, things can be made faster:
ldy buffer,x
inx
bne no_new_page
Of course x then can't be used anywhere else. |
| |
lft
Registered: Jul 2007 Posts: 369 |
Quoting Bitbreakerconsider for e.g. the elias gamma encoding as being used to encode the lengtths in doynax/byteboozer?
Yes, for some applications, there is no need for generic decision trees. Then it might be faster to use other methods. My approach can in principle be used to encode Elias Gamma or Huffman (with 128 symbols, I think), because both of these can be regarded as decision trees. But certainly, for the Elias Gamma case, it should be faster to use right-shifting instead of having a table entry for each shift position. And for Huffman, there is no need to support fields wider than one bit, so that can probably also be optimised.
Quote:Also when using x as lowbyte and index, things can be made faster:
ldy buffer,x
inx
bne no_new_page
Of course x then can't be used anywhere else.
This is an excellent improvement. Why didn't I think of that? =) |
| |
lft
Registered: Jul 2007 Posts: 369 |
Quoting lftThis is an excellent improvement. Why didn't I think of that? =)
Oh, now I remember. I did think of it. In the final decoder, both A and Y need to be preserved. Therefore, if we use indexed addressing to save four cycles, we must also save and restore one of those registers, at a cost of six cycles. |
| |
Frantic
Registered: Mar 2003 Posts: 1648 |
Dear Sir lft,
Thank you for your most excellent addition to the codebase.
My very best wishes,
Frantic |
| |
Glasnost Account closed
Registered: Aug 2011 Posts: 26 |
Nice article :)
Another good example of code optimisation.
replace the ..
clc
jmp decodeloop
with
clc
bcc decodeloop
and save a byte. |
... 3 posts hidden. Click here to view all posts.... |
Previous - 1 | 2 - Next |