[CSDb] - User Forums - ACME macro for delaying X cycles

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > ACME macro for delaying X cycles

2017-10-24 13:02

Frantic

Registered: Mar 2003
Posts: 1648

ACME macro for delaying X cycles

Anybody got an macro handy for the ACME assembler for delaying X number of cycles? It is OK if it kills the A or X register.

... 21 posts hidden. Click here to view all posts....

2017-10-25 22:25

chatGPZ

Registered: Dec 2001
Posts: 11386

doynax: another nice source for subtle bugs =)

2017-10-26 04:54

Oswald

Registered: Apr 2002
Posts: 5094

"doesn't clobber any registers or flags. Still requires two bytes of stack."

almost perfect, now one without jsr and 2-63 cycles please for the ultimate macro :)

2017-10-27 20:24

Han

Registered: Apr 2017
Posts: 8

Funny to see this question now when I was writing my own macro last week :)
Maybe this is useful for somebody (KickAssembler):

.macro waitx(Cycles)
{
	// Parameters of fast loop (outside a page boundary)
	.var LC=5 // Cycles per loop iteration (DEX, BNE)
	.var LoopCount=max(1, floor((Cycles-1)/LC)) // Loop counter
	.if((LoopCount>1) && (Cycles - (LoopCount*LC+1)==1)) { .eval LoopCount-- } // Handle only 1 remaining cycle
	.var ExtraCycles=max(0, Cycles - (LoopCount*LC+1)) // Cycles outside the loop
	.var ExtraBytes=max(0, ceil(ExtraCycles/2)) // Bytes required outside the loop

	// Parameters of slow loop (branch over page boundary)
	.var P_LC=6
	.var P_LoopCount=max(1, floor(Cycles/P_LC))
	.if((P_LoopCount>1) && (Cycles - (P_LoopCount*P_LC)==1)) { .eval P_LoopCount-- }
	.var P_ExtraCycles=max(0, Cycles - (P_LoopCount*P_LC))
	.var P_ExtraBytes=max(0, ceil(P_ExtraCycles/2))

	.var Relocate=false
	
	.var IsPageCrossed=(((<*)>=$fb) && ((<*)<=$fd))
	.if(IsPageCrossed)
	{ // Check if fast loop could be relocated to be slow and would also be smaller
		.var adr=*+ExtraBytes
		.if((ExtraBytes<P_ExtraBytes) && (((<adr)<$fb) || ((<adr)>$fd)))
		{
			.eval Relocate=true
		}
		else
		{
			.eval LoopCount=P_LoopCount
			.eval ExtraCycles=P_ExtraCycles
			.eval ExtraBytes=P_ExtraBytes
		}
	}
	else
	{ // Check if slow loop could be relocated to be fast and would also be smaller
		.var adr=*+P_ExtraBytes
		.if((P_ExtraBytes<ExtraBytes) && (((<adr)>=$fb) && ((<adr)<=$fd)))
		{
			.eval LoopCount=P_LoopCount
			.eval ExtraCycles=P_ExtraCycles
			.eval ExtraBytes=P_ExtraBytes
			.eval Relocate=true
		}
	}
	
	.if(ceil(Cycles/2) <= (5+ExtraBytes))
	{ // Loopless wait is smaller than using a loop
		wait(Cycles)
	}
	else
	{ // All that hassle for this small (relocated) loop :)
		.if(Relocate) { wait(ExtraCycles) }
		ldx #LoopCount
		dex
		bne *-1
		.if(!Relocate) { wait(ExtraCycles) }
	}
}

.macro wait(Cycles)
{
	.if(Cycles>0)
	{
		.if(Cycles<2) .error "Can't delay 1 cycle"
		.if((Cycles & 1)==0) { nop } else { bit $00 } // Delay 2 or 3 cycles
		.for(var i=1; i<floor(Cycles/2); i++) { nop } // Remaining even amount
	}
}

What this does is building an optimal Loop+Nop+Bit-combination that observes a page boundary.
Depending on the number of delay cycles and on the location of the loop the number of required extra cycles varies. So this macro checks if the extra bytes can be used to relocate the loop from/onto a page boundary so that the resulting number of bytes is minimal. (Of course it uses a loopless delay if that's even better.)

Example: your code starts at $08fd and you want to wait 24 cycles:

$08fd LDX #$04
$08ff DEX
$0900 BNE $08FF // Page crossing

If instead you wanted to wait 28 cycles at this location you could append 2 NOPs. But it's smaller to prepend just one NOP, thus relocating the loop off of the page boundary and adding one iteration:

$08fd NOP
$08fe LDX #$05
$0900 DEX
$0901 BNE $0900 // No page crossing

The wait() macro is just a simple loopless delay that's used inside waitx(). Using pha/pla the code size could be reduced even more so maybe I'll include that later.
Please note that I did test this but it's still work in progress..

2017-11-06 09:53

Cruzer

Registered: Dec 2001
Posts: 1048

Just got a crazy idea for delaying 13 cycles in 1 byte:

pause:
	rti

delay13Cycles:
	brk

Requires that the IRQ/BRK vector is set to the pause label, and no IRQs occur at the same time, which I guess is unlikely anyway when cycle-exact timing is going on. However, after a little test it seems like the PC skips a byte after returning with rti, so in reality it takes two bytes:

pause:
	rti

delay13Cycles:
	brk
	.by 0

2017-11-06 10:07

Krill

Registered: Apr 2002
Posts: 2980

Yes, BRK is a two-byte instruction. The operand byte is supposed to be an argument for the software interrupt you're triggering, pretty much similar to TRAP #<X> or INT <X> on other platforms.

It was intended for OS calls, i think, but i fail to come up with an example that actually uses the argument byte.
The 1581 ROM code only has a dummy parameter:

.8:959d  08          PHP
.8:959e  58          CLI
.8:959f  95 02       STA $02,X
.8:95a1  00          BRK
.8:95a2  EA          NOP

2017-11-07 23:01

Cruzer

Registered: Dec 2001
Posts: 1048

Interesting, did not know that. Wonder why BRK isn't usually interpreted as having an argument by assemblers/disassemblers.

2017-11-08 05:27

Oswald

Registered: Apr 2002
Posts: 5094

so byte after brk is loaded into A ? or just thrown away ? isnt it just some kind of side effect from jsr ?

2017-11-08 07:36

ChristopherJam

Registered: Aug 2004
Posts: 1409

Quoting Cruzer

However, after a little test it seems like the PC skips a byte after returning with rti, so in reality it takes two bytes

I guess you could make it a single byte 19 cycle delay by incrementing the return address in the interrupt handler, assuming you know the stack depth at the time of execution, and also avoid page boundary crossings in the 'caller'

2017-11-08 09:12

Krill

Registered: Apr 2002
Posts: 2980

Quoting Cruzer

Wonder why BRK isn't usually interpreted as having an argument by assemblers/disassemblers.

Usually, yes. Some assemblers allow an optional argument. Default is without, as usually BRK is used to end a program, discarding any code or data after it.

Quoting Oswald

so byte after brk is loaded into A ? or just thrown away ? isnt it just some kind of side effect from jsr ?

The byte needs to be retrieved manually, reading it from stack after finding its position via TSX.
It may be possible that this is just a side-effect of saving gates or re-using some other logic (but probably not JSR with its two argument bytes).

But there was one real-world application which at least mildly suggests it was a conscious decision. The 6502 was designed as a micro-controller for industrial machines, not a general-purpose CPU for home computers. Back then, PROMs were used for custom or low-volume machines, which would be turned on and immediately manipulate physical objects in the real world. The PROMs came with all bits set, and were programmed by blowing fuses to flip bits to 0, but those bits could never be reset to 1.
Now, the BRK opcode is $00, and it could be used to patch code in PROMs. Upon encountering BRK (which was some other instruction formerly), the interrupt handler could then look up the argument byte (in addition or alternatively to the return address on stack) and decide which patch routine for that location (located in a patch area on the PROM) to execute, then resume operation.

Has anybody interviewed Mr Peddle about this? :)

2017-11-08 14:44

lft

Registered: Jul 2007
Posts: 369

But in that case, the byte following BRK would be some random byte from the original code. If multiple patches were used, there would be no guarantee that the extra bytes would be different from each other.

Meanwhile, the *address* of the extra byte is available on the stack, and you would have to retrieve it anyway in order to read the extra byte. Hence, it is easier to just use the address (which is unique) to distinguish between different patches.

Previous - 1 | 2 | 3 | 4 - Next

Refresh

Subscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search for in
All times are CET.

Search CSDb

Advanced

Users Online

Flashback
anonym/padua
zscs
MWR/Visdom
Steve/Laser, Zenith,..
Alakran_64
Paladin/G★P
Scrap/Genesis Project
Guests online: 96

Top Demos

1 Next Level  (9.7)
2 13:37  (9.7)
3 Mojo  (9.7)
4 Coma Light 13  (9.6)
5 Edge of Disgrace  (9.6)
6 What Is The Matrix 2  (9.6)
7 The Demo Coder  (9.6)
8 Uncensored  (9.6)
9 Comaland 100%  (9.6)
10 Wonderland XIV  (9.6)

Top onefile Demos

1 Layers  (9.6)
2 No Listen  (9.6)
3 Cubic Dream  (9.6)
4 Party Elk 2  (9.6)
5 Copper Booze  (9.6)
6 Dawnfall V1.1  (9.5)
7 Rainbow Connection  (9.5)
8 Onscreen 5k  (9.5)
9 Morph  (9.5)
10 Libertongo  (9.5)

Top Groups

1 Performers  (9.3)
2 Booze Design  (9.3)
3 Oxyron  (9.3)
4 Triad  (9.3)
5 Censor Design  (9.3)

Top Original Suppliers

1 Derbyshire Ram  (9.7)
2 Fungus  (9.3)
3 Black Beard  (9.2)
4 Baracuda  (9.2)
5 hedning  (9.1)

Page generated in: 0.054 sec.