| |
Raistlin
Registered: Mar 2007 Posts: 554 |
Fast and Short Generalised Memset
I wondered... has anyone ever tried writing a generalised "memset" for demo use..?
A function/macro that you can call with either:-
StartAddress, EndAddress, byte
- fills [StartAddress, EndAddress)
or
StartAddress, Length, byte
- fills [StartAddress, StartAddress+Length)
I guess as a macro it could then have the optimization of being able to do 256-byte aligned fills faster (eg. filling $0400-07ff rather than $0400-07e7).
I'm sure someone must've done one of these..?
I also wonder how the performance of such a generalised solution, if made as a function rather than a macro, would compare to a partially unrolled version..? eg.:-
lax #$00
Fill0400Loop:
sta $0400,x
sta $0500,x
sta $0600,x
sta $0700,x
inx
bne Fill0400Loop |
|
| |
Krill
Registered: Apr 2002 Posts: 2839 |
Quoting RaistlinI wondered... has anyone ever tried writing a generalised "memset" for demo use..? Of course.
But you need a generalised memset in demos mostly only for initialisation, and then performance isn't much of an issue and you'd go for size.
When performance really is an issue, the old democoding rule applies. Use as much memory as possible, only optimise for size when needed.
So i guess this kind of code golf match would need a fourth parameter controlling how much you want to unroll the code. At the extreme end you'd have just LDA #0 with lots of STA mem16 and an rts. |
| |
TWW
Registered: Jul 2009 Posts: 541 |
Yes.
The issue is as Krill states, memory.
Earlier I solved this with a "speedmode variable" set in the beginning of each project which unrolled or used a looped routine. The generalized routine then optimize for less than 8 bytes (unrolled), between 8 and 256 bytes (single loop) and over 256 bytes (loop for all pages and finally loop for remainder). Handled by parameters and macro (well actually a pseudo 8-D).
Syntax looked like this:
:MEMSET destination_address ; number_of_bytes ; value ; safemode ; speedmode
:MEMSET $fb ; #120 ; #%01101110 ; 0 - Fills from $fb to $fb + 120 with #%01101110 and turns off safemode
:MEMSET $2000 ; #$7800 - Fills from $2000 to $9800 with #0
:MEMSET 4096 ; #$2000 ; #$80 ; ; 1 - Fills from $1000 to $3000 with #$80 with safemode according to glabal value and speedmode enabled
I'd post the code but it's embarasingly long and written a long time ago, can probably be rewritten much smoother today with the increased functionality with kickass. PM me if you want it.
Now I just use the generalised routine and unroll (partially or fully) according to the need to squeeze cycles vs. memory footprint. |
| |
Raistlin
Registered: Mar 2007 Posts: 554 |
Ahh, interesting. It sounds like you've done quite a bit of work in the area already. |
| |
JackAsser
Registered: Jun 2002 Posts: 1989 |
https://xkcd.com/1205/ |
| |
Raistlin
Registered: Mar 2007 Posts: 554 |
Pah, JackAsser, time has no meaning in THE MATRIX. |
| |
JackAsser
Registered: Jun 2002 Posts: 1989 |
Quote: Pah, JackAsser, time has no meaning in THE MATRIX.
😂😂😂 |
| |
Copyfault
Registered: Dec 2001 Posts: 466 |
What Krill said/wrote - literally!
Maybe I did not get the real thinking behind it, but wouldn't mem init be done by the decruncher anyway? Or is the focus on a plain init without decrunching as part of the very first init? Just wondering... |
| |
Krill
Registered: Apr 2002 Posts: 2839 |
Quoting Copyfaultbut wouldn't mem init be done by the decruncher anyway? This is something i'd not rely on.
In a demo, you ideally want to load the next part while the current one is running. That means the next part's code should take as little mem as possible in its decrunched state.
Once the current part transitions over to the next, you can shuffle code around and initialise memory.
Having a part's pre-init code tight with few and little zeroed gaps makes for better pack ratio, too. Also it's a good idea to have a part contained entirely in one file, for loader (and cruncher) reasons. =) |