Log inRegister an accountBrowse CSDbHelp & documentationFacts & StatisticsThe forumsAvailable RSS-feeds on CSDbSupport CSDb Commodore 64 Scene Database
You are not logged in - nap
CSDb User Forums


Forums > C64 Coding > Fastest time printing binary in BASIC or Assembly
2024-01-11 08:49
Mr SQL

Registered: Feb 2023
Posts: 117
Fastest time printing binary in BASIC or Assembly

Interesting Video on 8-bit show and tell:
https://www.youtube.com/watch?v=P8t6otqoz_E

What's the fastest time you can print binary to the screen in BASIC or Assembly?

I got it down to 26 seconds in BASIC without a pre-calc routine.
 
... 29 posts hidden. Click here to view all posts....
 
2024-01-11 21:14
chatGPZ

Registered: Dec 2001
Posts: 11149
Thats probably correct - so just a faster scroll routine would speed this up signficantly :)
2024-01-11 22:15
ws

Registered: Apr 2012
Posts: 230
nice challenge, tho
2024-01-12 00:08
JackAsser

Registered: Jun 2002
Posts: 1995
Quote: This probably does not count, but something like this seems obvious if speed is really the priority. This one assumes that you have a custom charset that has a "0" char at position 0 and a "1" char at position 1 in the charset, and it will always print the result at a specific fixed location on the screen.

;a register contains the byte to display as binary
ldx #1
sax $0407
lsr a
sax $0406
lsr a
sax $0405
lsr a
sax $0404
lsr a
sax $0403
lsr a
sax $0402
lsr a
sax $0401
lsr a
sax $0400

=8*2+8*4=48 cycles, so you could do it thousands of times in a single second.

SAX is one of the illegal opcodes, in case someone happens to be unfamiliar with that:
http://unusedino.de/ec64/technical/aay/c64/bsax.htm


I find this a much more interesting challenge! Can this be improved?!

Rules: it should be printed on a normal text screen, any charset is ok. One digit per char.
2024-01-12 06:46
TWW

Registered: Jul 2009
Posts: 541
Quote: I find this a much more interesting challenge! Can this be improved?!

Rules: it should be printed on a normal text screen, any charset is ok. One digit per char.


14 bytes, 16 cycles excl. call/ret:
    sta $2000
    lda #$3b
    sta $d011
    lda #$18
    sta $d018
    rts


How about bitmap with a 1x1 pixel charset :D

Ahyeah normal text screen...

But seriously, are we still talking about peeking the CIA data port (in which case 5 bits would be enough) or a general hexToBin()?

EDIT:

2 variants without resorting to charset trickery;
    // Variant #1:

    ldx #$18        // 2
    stx $0400       // 4
    stx $0401       // 4
    stx $0402       // 4
    stx $0403       // 4
    stx $0404       // 4
    stx $0405       // 4
    stx $0406       // 4
    stx $0407       // 4
    lsr             // 2
    asl $0400       // 4
    lsr             // 2
    asl $0401       // 4
    lsr             // 2
    asl $0402       // 4
    lsr             // 2
    asl $0403       // 4
    lsr             // 2
    asl $0404       // 4
    lsr             // 2
    asl $0405       // 4
    lsr             // 2
    asl $0406       // 4
    lsr             // 2
    asl $0407       // 4
                    // 16 x 4 + 9 * 2 = 82 cycles / 58 bytes
    // Probably has some illegal OPC voodo potential.


    // Variant #2
    ldx #'0'                        // 2
    ldy #'1'                        // 2 -> 4 cycles 'overhead'
    lsr                             // 2
    bcc !next0+                     // 2 / 3
    sty $0400                       // 4 -> 8/9 cycles dep. the branch
    lsr
    bcc !Next1+
!Prev0:
    sty $0401
    lsr
    bcc !Next2+
!Prev1:
    sty $0402
    lsr
    bcc !Next3+
!Prev2:
    sty $0403
    lsr
    bcc !Next4+
!Prev3:
    sty $0404
    lsr
    bcc !Next5+
!Prev4:
    sty $0405
    lsr
    bcc !Next6+
!Prev5:
    sty $0406
    lsr
    bcc !Next7+
!Prev6:
    sty $0407
    rts
!Next0:
    stx $0400
    lsr
    bcs !Prev0-
!Next1:
    stx $0401
    lsr
    bcc !Prev1-
!Next2:
    stx $0402
    lsr
    bcc !Prev1-
!Next3:
    stx $0403
    lsr
    bcc !Prev1-
!Next4:
    stx $0404
    lsr
    bcc !Prev1-
!Next5:
    stx $0405
    lsr
    bcc !Prev1-
!Next6:
    stx $0406
    lsr
    bcc !Prev1-
!Next7:
    stx $0407
    rts                     // ~8 x 8 (+8) + 2 = ~66/74 cycles / ~100 bytes
[code]

Edit 2: 2 variants with voodo and shameless charset trickery:

[/code]
    // Variant #3 with voodo
    ldx #$18        // 2
    stx $0400       // 4
    stx $0401       // 4
    stx $0402       // 4
    stx $0403       // 4
    stx $0404       // 4
    stx $0405       // 4
    stx $0406       // 4
    stx $0407       // 4
    slo $0400       // 4
    slo $0401       // 4
    slo $0402       // 4
    slo $0403       // 4
    slo $0404       // 4
    slo $0405       // 4
    slo $0406       // 4
    slo $0407       // 4
                    // 16 x 4 + 2 = 66 cycles / 50 bytes

    // Variant #4 with voodo & charset (space = "0", ! = "1")
    slo $0400       // 4
    slo $0401       // 4
    slo $0402       // 4
    slo $0403       // 4
    slo $0404       // 4
    slo $0405       // 4
    slo $0406       // 4
    slo $0407       // 4
                    // 8 x 4 = 32 cycles / 24 bytes
2024-01-12 16:59
Frantic

Registered: Mar 2003
Posts: 1630
slo $ffff is 6 cycles, not 4.

http://unusedino.de/ec64/technical/aay/c64/bslo.htm
2024-01-12 20:49
Mr SQL

Registered: Feb 2023
Posts: 117
Quoting spider-j
To be fair I also used the KERNAL output routine for printing the chars.

Small solution with loops / 78 Bytes PRG:
https://trans.jansalleine.com/c64/num2binary.prg

                    !cpu 6510
; ==============================================================================
save_num    = 0x02
CHAROUT     = 0xF1CA
; ==============================================================================
                    *= 0x0801
                    ; basic TI$ timer wrapper program:
                    ; --------------------------------
                    ; 0 TI$="000000":SYS2092
                    ; 1 PRINT"TIME:"TI/60
                    ; --------------------------------
                    ; RESULT: 8.91666667
                    !byte 0x18, 0x08, 0x00, 0x00
                    !byte 0x54, 0x49, 0x24, 0xB2
                    !byte 0x22, 0x30, 0x30, 0x30
                    !byte 0x30, 0x30, 0x30, 0x22
                    !byte 0x3A, 0x9E, 0x32, 0x30
                    !byte 0x39, 0x32, 0x00, 0x2A
                    !byte 0x08, 0x01, 0x00, 0x99
                    !byte 0x22, 0x54, 0x49, 0x4D
                    !byte 0x45, 0x3A, 0x22, 0x54
                    !byte 0x49, 0xAD, 0x36, 0x30
                    !byte 0x00, 0x00, 0x00
; ==============================================================================
                    *= 0x082C
                    ldy #0
--                  sty save_num
                    ldx #0
-                   clc
                    rol save_num
                    bcc +
                    lda #'1'
                    !byte 0x2C
+                   lda #'0'
                    jsr CHAROUT
                    inx
                    cpx #8
                    bne -
                    lda #0x0D
                    jsr CHAROUT
                    iny
                    bne --
                    rts


Slightly faster solution with unrolled loops / 28974 Bytes PRG:
https://trans.jansalleine.com/c64/num2binary_unrolled.prg
                    !cpu 6510
; ==============================================================================
save_num    = 0x02
CHAROUT     = 0xF1CA
; ==============================================================================
                    *= 0x0801
                    ; basic TI$ timer wrapper program:
                    ; --------------------------------
                    ; 0 TI$="000000":SYS2092
                    ; 1 PRINT"TIME:"TI/60
                    ; --------------------------------
                    ; RESULT: 8.9
                    !byte 0x18, 0x08, 0x00, 0x00
                    !byte 0x54, 0x49, 0x24, 0xB2
                    !byte 0x22, 0x30, 0x30, 0x30
                    !byte 0x30, 0x30, 0x30, 0x22
                    !byte 0x3A, 0x9E, 0x32, 0x30
                    !byte 0x39, 0x32, 0x00, 0x2A
                    !byte 0x08, 0x01, 0x00, 0x99
                    !byte 0x22, 0x54, 0x49, 0x4D
                    !byte 0x45, 0x3A, 0x22, 0x54
                    !byte 0x49, 0xAD, 0x36, 0x30
                    !byte 0x00, 0x00, 0x00
; ==============================================================================
                    *= 0x082C
                    !for i, 0, 255 {
                         lda #i
                         sta save_num
                         !for j, 0, 7 {
                              clc
                              rol save_num
                              bcc +
                              lda #'1'
                              !byte 0x2C
+                             lda #'0'
                              jsr CHAROUT
                         }
                         lda #0x0D
                         jsr CHAROUT
                    }
                    rts


EDIT: corrected ror -> rol.


Both good solutions! Keep in mind we also have to print the decimal number with a space and then the binary string followed by a carriage return for each number from 0-255. This will add to the time.

I put my BASIC solution in the comments on the video and noticed as observed in this thread that the kernel CHAROUT routine is consuming most of the time. Just printing 0's and 1's instead of "0" and "1" as strings increased the time of my BASIC solution by 12 seconds, the kernel routine handles BASIC strings faster than BASIC numbers.

My idea for an optimized assembly solution would be to use 8 page aligned 256 byte tables of 0 and 1 characters, one for each bit.

This would allow a shared index for 8 lda's of 4 cycles each without having to branch after each load, just pushing the values loaded to CHAROUT.
2024-01-12 23:52
spider-j

Registered: Oct 2004
Posts: 449
Quoting Mr SQL
My idea for an optimized assembly solution would be to use 8 page aligned 256 byte tables of 0 and 1 characters, one for each bit.

This would allow a shared index for 8 lda's of 4 cycles each without having to branch after each load, just pushing the values loaded to CHAROUT.

So I did it like you suggested (+ including decimal number and clear screen to get the "exact" (don't know what CHR$(5) does) output like in the video.

One could also unroll the main loop, but this won't do much because of the unprecise TI$ measuring.

https://trans.jansalleine.com/c64/num2binary_table.prg
3073 Bytes
                    !cpu 6510
; ==============================================================================
CRSRX       = 0xD3
CHAROUT     = 0xF1CA
CLRSCR      = 0xE544
; ==============================================================================
                    *= 0x0801
                    ; basic TI$ timer wrapper program:
                    ; --------------------------------
                    ; 0 TI$="000000":SYS2092
                    ; 1 PRINT"TIME:"TI/60
                    ; --------------------------------
                    ; RESULT: ~9
                    !byte 0x18, 0x08, 0x00, 0x00
                    !byte 0x54, 0x49, 0x24, 0xB2
                    !byte 0x22, 0x30, 0x30, 0x30
                    !byte 0x30, 0x30, 0x30, 0x22
                    !byte 0x3A, 0x9E, 0x32, 0x30
                    !byte 0x39, 0x32, 0x00, 0x2A
                    !byte 0x08, 0x01, 0x00, 0x99
                    !byte 0x22, 0x54, 0x49, 0x4D
                    !byte 0x45, 0x3A, 0x22, 0x54
                    !byte 0x49, 0xAD, 0x36, 0x30
                    !byte 0x00, 0x00, 0x00
; ==============================================================================
                    *= 0x082C
                    jsr CLRSCR
                    ldx #0
-                   lda #' '
                    jsr CHAROUT
                    lda dec2,x
                    jsr CHAROUT
                    lda dec1,x
                    jsr CHAROUT
                    lda dec0,x
                    jsr CHAROUT
                    inc CRSRX
                    inc CRSRX
                    inc CRSRX
                    inc CRSRX
                    inc CRSRX
                    inc CRSRX
                    lda bit7,x
                    jsr CHAROUT
                    lda bit6,x
                    jsr CHAROUT
                    lda bit5,x
                    jsr CHAROUT
                    lda bit4,x
                    jsr CHAROUT
                    lda bit3,x
                    jsr CHAROUT
                    lda bit2,x
                    jsr CHAROUT
                    lda bit1,x
                    jsr CHAROUT
                    lda bit0,x
                    jsr CHAROUT
                    lda #0x0D
                    jsr CHAROUT
                    inx
                    bne -
                    rts
; ==============================================================================
                    !align 255, 0, 0
bit7:               !for i, 0, 255 {
                         !byte ((i AND %10000000) >> 7) OR 0x30
                    }
bit6:               !for i, 0, 255 {
                         !byte ((i AND %01000000) >> 6) OR 0x30
                    }
bit5:               !for i, 0, 255 {
                         !byte ((i AND %00100000) >> 5) OR 0x30
                    }
bit4:               !for i, 0, 255 {
                         !byte ((i AND %00010000) >> 4) OR 0x30
                    }
bit3:               !for i, 0, 255 {
                         !byte ((i AND %00001000) >> 3) OR 0x30
                    }
bit2:               !for i, 0, 255 {
                         !byte ((i AND %00000100) >> 2) OR 0x30
                    }
bit1:               !for i, 0, 255 {
                         !byte ((i AND %00000010) >> 1) OR 0x30
                    }
bit0:               !for i, 0, 255 {
                         !byte ((i AND %00000001) >> 0) OR 0x30
                    }
; ==============================================================================
dec2:               !for i, 0, 255 {
                         !if i < 100 {
                              !byte 0x20
                         } else if i < 200 {
                              !byte 0x31
                         } else {
                              !byte 0x32
                         }
                    }
dec1:               !for i, 0, 255 {
                         !if i < 10 {
                              !byte 0x20
                         } else {
                              !byte ((i / 10) - ((i / 100) * 10)) OR 0x30
                         }
                    }
dec0:               !for i, 0, 255 {
                         !byte (i % 10) OR 0x30
                    }


As already stated in this thread: KERNAL char out / print is the most expensive operation in this whole scenario anyway.

But without changing the "rules" like others suggested there's no way around that other than implementing your own faster routines for that – what goes a little bit beyond what so small "excersises" usually want to accomplish.

That guy making those isn't a scener and has a very "oldskool" approach to everything – working with your C64 as "intended" by the user manual :-) It's sometimes still kind of fun, but I usually also fast forward a lot when watching videos from him.
2024-01-13 05:12
Mr SQL

Registered: Feb 2023
Posts: 117
Quoting spider-j
Quoting Mr SQL
My idea for an optimized assembly solution would be to use 8 page aligned 256 byte tables of 0 and 1 characters, one for each bit.

This would allow a shared index for 8 lda's of 4 cycles each without having to branch after each load, just pushing the values loaded to CHAROUT.

So I did it like you suggested (+ including decimal number and clear screen to get the "exact" (don't know what CHR$(5) does) output like in the video.

One could also unroll the main loop, but this won't do much because of the unprecise TI$ measuring.

https://trans.jansalleine.com/c64/num2binary_table.prg
3073 Bytes
                    !cpu 6510
; ==============================================================================
CRSRX       = 0xD3
CHAROUT     = 0xF1CA
CLRSCR      = 0xE544
; ==============================================================================
                    *= 0x0801
                    ; basic TI$ timer wrapper program:
                    ; --------------------------------
                    ; 0 TI$="000000":SYS2092
                    ; 1 PRINT"TIME:"TI/60
                    ; --------------------------------
                    ; RESULT: ~9
                    !byte 0x18, 0x08, 0x00, 0x00
                    !byte 0x54, 0x49, 0x24, 0xB2
                    !byte 0x22, 0x30, 0x30, 0x30
                    !byte 0x30, 0x30, 0x30, 0x22
                    !byte 0x3A, 0x9E, 0x32, 0x30
                    !byte 0x39, 0x32, 0x00, 0x2A
                    !byte 0x08, 0x01, 0x00, 0x99
                    !byte 0x22, 0x54, 0x49, 0x4D
                    !byte 0x45, 0x3A, 0x22, 0x54
                    !byte 0x49, 0xAD, 0x36, 0x30
                    !byte 0x00, 0x00, 0x00
; ==============================================================================
                    *= 0x082C
                    jsr CLRSCR
                    ldx #0
-                   lda #' '
                    jsr CHAROUT
                    lda dec2,x
                    jsr CHAROUT
                    lda dec1,x
                    jsr CHAROUT
                    lda dec0,x
                    jsr CHAROUT
                    inc CRSRX
                    inc CRSRX
                    inc CRSRX
                    inc CRSRX
                    inc CRSRX
                    inc CRSRX
                    lda bit7,x
                    jsr CHAROUT
                    lda bit6,x
                    jsr CHAROUT
                    lda bit5,x
                    jsr CHAROUT
                    lda bit4,x
                    jsr CHAROUT
                    lda bit3,x
                    jsr CHAROUT
                    lda bit2,x
                    jsr CHAROUT
                    lda bit1,x
                    jsr CHAROUT
                    lda bit0,x
                    jsr CHAROUT
                    lda #0x0D
                    jsr CHAROUT
                    inx
                    bne -
                    rts
; ==============================================================================
                    !align 255, 0, 0
bit7:               !for i, 0, 255 {
                         !byte ((i AND %10000000) >> 7) OR 0x30
                    }
bit6:               !for i, 0, 255 {
                         !byte ((i AND %01000000) >> 6) OR 0x30
                    }
bit5:               !for i, 0, 255 {
                         !byte ((i AND %00100000) >> 5) OR 0x30
                    }
bit4:               !for i, 0, 255 {
                         !byte ((i AND %00010000) >> 4) OR 0x30
                    }
bit3:               !for i, 0, 255 {
                         !byte ((i AND %00001000) >> 3) OR 0x30
                    }
bit2:               !for i, 0, 255 {
                         !byte ((i AND %00000100) >> 2) OR 0x30
                    }
bit1:               !for i, 0, 255 {
                         !byte ((i AND %00000010) >> 1) OR 0x30
                    }
bit0:               !for i, 0, 255 {
                         !byte ((i AND %00000001) >> 0) OR 0x30
                    }
; ==============================================================================
dec2:               !for i, 0, 255 {
                         !if i < 100 {
                              !byte 0x20
                         } else if i < 200 {
                              !byte 0x31
                         } else {
                              !byte 0x32
                         }
                    }
dec1:               !for i, 0, 255 {
                         !if i < 10 {
                              !byte 0x20
                         } else {
                              !byte ((i / 10) - ((i / 100) * 10)) OR 0x30
                         }
                    }
dec0:               !for i, 0, 255 {
                         !byte (i % 10) OR 0x30
                    }


As already stated in this thread: KERNAL char out / print is the most expensive operation in this whole scenario anyway.

But without changing the "rules" like others suggested there's no way around that other than implementing your own faster routines for that – what goes a little bit beyond what so small "excersises" usually want to accomplish.

That guy making those isn't a scener and has a very "oldskool" approach to everything – working with your C64 as "intended" by the user manual :-) It's sometimes still kind of fun, but I usually also fast forward a lot when watching videos from him.


Very cool I tested this Assembly version at 8.75 seconds!

This is probably as fast as we can make it run in asm if we follow Robin's exercise strictly.

I like the way you had the Assembler create the tables! Which Assembler are you using?

I was motivated to try that in BASIC and got the BASIC version down to 12 seconds including the pre-calc to load the array with a clustered index on a single table since BASIC is a high-level language that cannot handle narrow tables as efficiently.

The first part of the prg builds the data statements which is already done, goto 1000 loads the array and iterates:

https://relationalframework.com/basic12seconds.prg
2024-01-13 14:10
Street Tuff

Registered: Feb 2002
Posts: 88
>> I like the way you had the Assembler create the tables! Which Assembler are you using?

Thats the ACME-Assembler. https://sourceforge.net/projects/acme-crossass/
2024-01-13 17:02
ChristopherJam

Registered: Aug 2004
Posts: 1382
This one takes 13.18 seconds (as compared to just doing print:return in the printing routine, which takes 8.91 - the binary conversion+print is hence around 4.27). So yeah, ws is correct. It's dominated by screen scroll time.

Those times are if you run after loading from reset - if you start at bottom of the screen everything is slower, because yous start scrolling immediately.

0 gosub9:ti$="000000":goto2
1 printd$(i/16)d$(iand15):return
2 fori=0to255:gosub1:next
3 print"time:"ti/60:end
7 data"0000","0001","0010","0011","0100","0101","0110","0111"
8 data"1000","1001","1010","1011","1100","1101","1110","1111"
9 dimd$(16):fori=0to15:readd$(i):next:return
Previous - 1 | 2 | 3 | 4 - Next
RefreshSubscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search   for   in  
All times are CET.
Search CSDb
Advanced
Users Online
Guests online: 73
Top Demos
1 Next Level  (9.7)
2 13:37  (9.7)
3 Mojo  (9.7)
4 Coma Light 13  (9.7)
5 Edge of Disgrace  (9.6)
6 No Bounds  (9.6)
7 Comaland 100%  (9.6)
8 Aliens in Wonderland  (9.6)
9 Uncensored  (9.6)
10 Wonderland XIV  (9.6)
Top onefile Demos
1 Happy Birthday Dr.J  (9.7)
2 Layers  (9.6)
3 It's More Fun to Com..  (9.6)
4 Cubic Dream  (9.6)
5 Party Elk 2  (9.6)
6 Copper Booze  (9.6)
7 TRSAC, Gabber & Pebe..  (9.5)
8 Rainbow Connection  (9.5)
9 Dawnfall V1.1  (9.5)
10 Daah, Those Acid Pil..  (9.5)
Top Groups
1 Nostalgia  (9.4)
2 Oxyron  (9.3)
3 Booze Design  (9.3)
4 Censor Design  (9.3)
5 SHAPE  (9.3)
Top Crackers
1 Mr. Z  (9.9)
2 Antitrack  (9.8)
3 OTD  (9.8)
4 S!R  (9.7)
5 Fungus  (9.7)

Home - Disclaimer
Copyright © No Name 2001-2024
Page generated in: 0.067 sec.