Log inRegister an accountBrowse CSDbHelp & documentationFacts & StatisticsThe forumsAvailable RSS-feeds on CSDbSupport CSDb Commodore 64 Scene Database
You are not logged in - nap
CSDb User Forums


Forums > C64 Coding > Fast and Short Generalised Memset
2020-12-09 09:57
Raistlin

Registered: Mar 2007
Posts: 771
Fast and Short Generalised Memset

I wondered... has anyone ever tried writing a generalised "memset" for demo use..?

A function/macro that you can call with either:-

StartAddress, EndAddress, byte
- fills [StartAddress, EndAddress)

or

StartAddress, Length, byte
- fills [StartAddress, StartAddress+Length)

I guess as a macro it could then have the optimization of being able to do 256-byte aligned fills faster (eg. filling $0400-07ff rather than $0400-07e7).

I'm sure someone must've done one of these..?


I also wonder how the performance of such a generalised solution, if made as a function rather than a macro, would compare to a partially unrolled version..? eg.:-

lax #$00
Fill0400Loop:
sta $0400,x
sta $0500,x
sta $0600,x
sta $0700,x
inx
bne Fill0400Loop
2020-12-09 10:14
Krill

Registered: Apr 2002
Posts: 3098
Quoting Raistlin
I wondered... has anyone ever tried writing a generalised "memset" for demo use..?
Of course.

But you need a generalised memset in demos mostly only for initialisation, and then performance isn't much of an issue and you'd go for size.

When performance really is an issue, the old democoding rule applies. Use as much memory as possible, only optimise for size when needed.
So i guess this kind of code golf match would need a fourth parameter controlling how much you want to unroll the code. At the extreme end you'd have just LDA #0 with lots of STA mem16 and an rts.
2020-12-09 13:01
TWW

Registered: Jul 2009
Posts: 557
Yes.

The issue is as Krill states, memory.

Earlier I solved this with a "speedmode variable" set in the beginning of each project which unrolled or used a looped routine. The generalized routine then optimize for less than 8 bytes (unrolled), between 8 and 256 bytes (single loop) and over 256 bytes (loop for all pages and finally loop for remainder). Handled by parameters and macro (well actually a pseudo 8-D).

Syntax looked like this:
    :MEMSET destination_address ; number_of_bytes ; value ; safemode ; speedmode


    :MEMSET $fb ; #120 ; #%01101110 ; 0  - Fills from $fb to $fb + 120 with #%01101110 and turns off safemode
    :MEMSET $2000 ; #$7800               - Fills from $2000 to $9800 with #0
    :MEMSET 4096 ; #$2000 ; #$80 ; ; 1   - Fills from $1000 to $3000 with #$80 with safemode according to glabal value and speedmode enabled


I'd post the code but it's embarasingly long and written a long time ago, can probably be rewritten much smoother today with the increased functionality with kickass. PM me if you want it.

Now I just use the generalised routine and unroll (partially or fully) according to the need to squeeze cycles vs. memory footprint.
2020-12-09 13:30
Raistlin

Registered: Mar 2007
Posts: 771
Ahh, interesting. It sounds like you've done quite a bit of work in the area already.
2020-12-09 14:37
JackAsser

Registered: Jun 2002
Posts: 2038
https://xkcd.com/1205/
2020-12-09 14:39
Raistlin

Registered: Mar 2007
Posts: 771
Pah, JackAsser, time has no meaning in THE MATRIX.
2020-12-09 14:48
JackAsser

Registered: Jun 2002
Posts: 2038
Quote: Pah, JackAsser, time has no meaning in THE MATRIX.

šŸ˜‚šŸ˜‚šŸ˜‚
2020-12-09 19:18
Copyfault

Registered: Dec 2001
Posts: 487
What Krill said/wrote - literally!

Maybe I did not get the real thinking behind it, but wouldn't mem init be done by the decruncher anyway? Or is the focus on a plain init without decrunching as part of the very first init? Just wondering...
2020-12-10 00:01
Krill

Registered: Apr 2002
Posts: 3098
Quoting Copyfault
but wouldn't mem init be done by the decruncher anyway?
This is something i'd not rely on.

In a demo, you ideally want to load the next part while the current one is running. That means the next part's code should take as little mem as possible in its decrunched state.

Once the current part transitions over to the next, you can shuffle code around and initialise memory.

Having a part's pre-init code tight with few and little zeroed gaps makes for better pack ratio, too. Also it's a good idea to have a part contained entirely in one file, for loader (and cruncher) reasons. =)
RefreshSubscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search   for   in  
All times are CET.
Search CSDb
Advanced
Users Online
Jazzcat/Onslaught
Alakran_64
hedning/Gā˜…P
Wiklund/Fairlight
d4ng3r
CA$H/TRiAD
MCM/ONSLAUGHT
Murphy/Exceed
Chesser/Blazon
iAN CooG/HVSC
TheRyk/MYD!
Guests online: 251
Top Demos
1 Next Level  (9.7)
2 13:37  (9.7)
3 Codeboys & Endians  (9.7)
4 Mojo  (9.6)
5 Coma Light 13  (9.6)
6 Edge of Disgrace  (9.6)
7 Signal Carnival  (9.6)
8 Wonderland XIV  (9.5)
9 Uncensored  (9.5)
10 Comaland 100%  (9.5)
Top onefile Demos
1 Nine  (9.7)
2 Layers  (9.6)
3 Cubic Dream  (9.6)
4 Party Elk 2  (9.6)
5 Copper Booze  (9.5)
6 Scan and Spin  (9.5)
7 Onscreen 5k  (9.5)
8 Grey  (9.5)
9 Dawnfall V1.1  (9.5)
10 Rainbow Connection  (9.5)
Top Groups
1 Artline Designs  (9.3)
2 Booze Design  (9.3)
3 Oxyron  (9.3)
4 Performers  (9.3)
5 Censor Design  (9.3)
Top Original Suppliers
1 Derbyshire Ram  (9.7)
2 Black Beard  (9.2)
3 Baracuda  (9.2)
4 hedning  (9.1)
5 Irata  (8.8)

Home - Disclaimer
Copyright Ā© No Name 2001-2025
Page generated in: 0.075 sec.