Log inRegister an accountBrowse CSDbHelp & documentationFacts & StatisticsThe forumsAvailable RSS-feeds on CSDbSupport CSDb Commodore 64 Scene Database
 Welcome to our latest new user maak ! (Registered 2024-04-18) You are not logged in - nap
CSDb User Forums


Forums > C64 Coding > Profiling tool?
2019-02-20 08:44
ChristopherJam

Registered: Aug 2004
Posts: 1370
Profiling tool?

Has anyone come across or written a tool that lets you execute a chunk of 6502 that then reports back a disassembly where each instruction is annotated with the total number of cycles it consumed?

Something like unp64 but with added instrumentation, basically.

Closest a quick google found for me was a tool for generating such reports from logs produced by a hardware bus monitor..
2019-02-20 09:02
soci

Registered: Sep 2003
Posts: 473
This was discussed already:
https://csdb.dk/forums/?roomid=11&topicid=68967&firstpost=36

And mentioned later here:
https://csdb.dk/forums/?roomid=11&topicid=125026&firstpost=12

Otherwise who would want to do that for demo coding.
2019-02-20 09:33
JackAsser

Registered: Jun 2002
Posts: 1987
I did to check min/max/avg raster time for sid tunes. Output is time spent on each instruction and a disassembly, to find hot spots (typically vibrato or pulse sweeps)
2019-02-20 10:09
ChristopherJam

Registered: Aug 2004
Posts: 1370
Ah excellent, thanks soci.

I somehow missed the trace command in the VICE monitor. I assume that's what you used for "Then I run the binary in VICE and start a trace at the place of interest for as long as necessary"?

seems to be hella slow for 700,000 cycles worth of execution, mind :(
2019-02-20 13:55
Krill

Registered: Apr 2002
Posts: 2825
Quoting JackAsser
I did to check min/max/avg raster time for sid tunes. Output is time spent on each instruction and a disassembly, to find hot spots (typically vibrato or pulse sweeps)
Off topic, but that reminds me of an old idea of mine:

At the beginning of the player routine called each frame, read this frame's SID register values from a ring buffer and write them to the actual SID registers. Then switch to a thread that would fill the ring buffer with new data until switching back to the calling thread is done with a timer or raster interrupt.

Result: player CPU load spikes are evened out, player code and structure is greatly simplified as no manual load balancing is required. Maximum player CPU time for every video frame can be minimised.
2019-02-20 19:09
Digger

Registered: Mar 2005
Posts: 421
@Krill: genius!
2019-02-20 21:06
chatGPZ

Registered: Dec 2001
Posts: 11100
i used that in some part (with an existing player of course) to get rid of spikes.... forgot where though :)
2019-02-20 21:47
Frantic

Registered: Mar 2003
Posts: 1627
Ergo; Groepaz is a genius
2019-02-21 12:15
Perplex

Registered: Feb 2009
Posts: 254
I know Bjørn Røstøen has done this, as well as the guys in Booze Design. There are many geniuses out there.
2019-02-21 12:42
chatGPZ

Registered: Dec 2001
Posts: 11100
Ah yes, i think i stole the idea from HCL. There might be a thread about this here (from about a decade ago) :)
2019-02-21 15:48
ChristopherJam

Registered: Aug 2004
Posts: 1370
The scene is full of the greatest minds in the galaxy :D

Now, where's my profiling tool? :P
2019-02-22 08:55
Slajerek

Registered: May 2015
Posts: 62
Actually, a profiling tool is messing my mind for some time already. I have a proof of concept view of such cycles counting and display for the C64 Debugger, but it is not in the master branch yet.

Brush/Elysium showed me at X a Champ 65C02 profiler written in Ruby that is able to generate nice profiling reports in HTML, it shows cycle % per detected procedures and utilizes GraphViz for the code flow diagrams.

I changed the source of Champ to be able to feed data generated by my debugger and just Today released the first alpha version of this, it is included in C64 Debugger v0.64.56.4. Although it is not perfect, as still I need to add labels and watches, but it is quite usable even now.

Go to Settings/Emulation, set profiler output file and start profiling for some time. It will create a *.pd file which then you need to feed to C64 Debugger's version of Champ: http://csdb.dk/getinternalfile.php/179301/c64d-champ-v064564.rb

It should generate a report.html which you can view in a web browser. It has some flaws though that have to be noted:
- the cycle counter is a global VIC counter, not the CPU cycle counter, so bad line cycles are also added, I'm going to add a setting to select which cycle counter to use for profiling
- Champ is completely not able to distinguish if we are in IRQ, thus, cycles taken by code run in IRQ are also counted as a procedure code.

Anyway, this is some starting point I am going to develop further. It is not giving cycles % spent per each code address, but the POC for such view is existing and I suspect should be added in some upcoming version of the C64 Debugger soon.
2019-02-22 09:12
JackAsser

Registered: Jun 2002
Posts: 1987
Quote: The scene is full of the greatest minds in the galaxy :D

Now, where's my profiling tool? :P


You know very well that even if there was such a tool publicly available you'd find an excuse to code you own in Python. I bet you've already started. That's how we work.
2019-02-22 12:09
chatGPZ

Registered: Dec 2001
Posts: 11100
didnt micro64 have some profiling stuff in its "ide"?
2019-02-22 15:06
oziphantom

Registered: Oct 2014
Posts: 478
VICE PDB Monitor 1.0 (get the extra files from github as per comments ) has a limited profiler, Martin did it so I'm not a 100% sure how it works. But I think it uses the memmap operations to count how often an address is executed. So its doesn't give you clocks but it does give hotspots. Its open source so if you wish to extend it to be better, feel free ;)
2019-02-22 15:08
chatGPZ

Registered: Dec 2001
Posts: 11100
windows specific hack though, and requires a custom non supported acme hack. meh
2019-02-22 15:16
oziphantom

Registered: Oct 2014
Posts: 478
hack? what hack? It makes no modifications to VICE at all and will even connect to a Linux instance if you set up a tunnel for it. To be clear it only uses the Remote Debugger Protocol to communicate to VICE.

It can use Martin's custom ACME, however ACME is for morons and 64Tass is also supported(with more features) for true sensible people ;)
2019-02-22 15:31
chatGPZ

Registered: Dec 2001
Posts: 11100
Now i'd like to know what cybermodo has to say about this topic :)
2019-02-22 15:42
oziphantom

Registered: Oct 2014
Posts: 478
I can't believe he uses ACME, that would mean he knows assembly...
2019-02-22 16:14
chatGPZ

Registered: Dec 2001
Posts: 11100
Obviously he doesnt - but he'd sure have to say something about it =D
2019-02-22 16:41
oziphantom

Registered: Oct 2014
Posts: 478
you could copy paste it for him to lemon
2019-02-22 16:43
chatGPZ

Registered: Dec 2001
Posts: 11100
NO :)
2019-02-23 10:46
ChristopherJam

Registered: Aug 2004
Posts: 1370
Ah, thanks Slajerek, that's an excellent start.

I'm not seeing any cyclecounts or timestamps in the log output though?

So yes, JackAsser, I am indeed currently writing a python script to parse c64debugger logs and the source PRG to add that information..


I'm sure CyberMojoJoJo could whip up a native profiling tool in C in no time at all, assuming someone else has provided some libraries to do all the heavy lifting. Maybe to scroll flamegraphs in ifli?
2019-02-24 00:04
ChristopherJam

Registered: Aug 2004
Posts: 1370
Whee, this is fun.

First bash at collecting stats from the output from c64 debugger; this is a run of tinycrunch (fast variant) decrunching 7.bin from the Pearls for Pigs corpus.

(well ok, second .prg profiled; started building this for a sekret project, output of that one's under wraps for now)

Columns are address, instruction, total cycles spent by that instruction, cumulative percentage of cycles expended, and an ASCII graph of the latter.

Clearly I'd save a fucktonne of cycles if I relocated the decruncher to zero page and inlined the pointers at 0a30 and 0a32 :)

  0a00  stx $07              3     0.00%  
  0a02  ldy #$02             2     0.00%  
  0a04  sta $0004,y         15     0.00%  
  0a07  lda ($06),y         15     0.01%  
  0a09  dey                  6     0.01%  
  0a0a  bpl $0a04            8     0.01%  
  0a0c  pha                  3     0.01%  
  0a0d  sec                  2     0.01%  
  0a0e  bcs $0a40            3     0.01%  
  0a10  pla                  4     0.01%  
  0a11  iny                  2     0.01%  
  0a12  sta ($04),y          6     0.01%  
  0a14  rts                  6     0.01%  
  0a15  inc $05            255     0.06%  
  0a17  clc                102     0.08%  
  0a18  bcc $0a40          153     0.11%  
  0a1a  beq $0a10         3577     0.80%  
  0a1c  iny               3574     1.49%  #
  0a1d  clc               3574     2.17%  #
  0a1e  lda ($06),y       8935     3.89%  ##
  0a20  adc $04           5361     4.92%  ##
  0a22  sta $08           5361     5.95%  ###
  0a24  txa               3574     6.64%  ###
  0a25  ora #$f8          3574     7.33%  ####
  0a27  adc $05           5361     8.36%  ####
  0a29  sta $09           5361     9.39%  #####
  0a2b  lda ($08),y       8935    11.11%  ######
  0a2d  sta ($04),y      10722    13.17%  #######
  0a2f  iny              22824    17.56%  #########
  0a30  lda ($08),y      57060    28.53%  ##############
  0a32  sta ($04),y      68472    41.69%  #####################
  0a34  txa              22824    46.08%  #######################
  0a35  sbx #$08         22824    50.47%  #########################
  0a37  bpl $0a2f        32449    56.71%  ############################
  0a39  tya               3574    57.40%  #############################
  0a3a  adc $04           5361    58.43%  #############################
  0a3c  sta $04           5361    59.46%  ##############################
  0a3e  bcs $0a15         3625    60.16%  ##############################
  0a40  lda #$02          3576    60.84%  ##############################
  0a42  adc $06           5364    61.88%  ###############################
  0a44  sta $06           5364    62.91%  ###############################
  0a46  bcs $0a96         3586    63.60%  ################################
  0a48  ldy #$00          4884    64.54%  ################################
  0a4a  lax ($06),y      12210    66.88%  #################################
  0a4c  bpl $0a1a         6341    68.10%  ##################################
  0a4e  asl               1970    68.48%  ##################################
  0a4f  bmi $0a75         2466    68.96%  ##################################
  0a51  iny              11768    71.22%  ####################################
  0a52  lda ($06),y      29420    76.87%  ######################################
  0a54  sta ($04),y      35304    83.66%  ##########################################
  0a56  dex              11768    85.93%  ###########################################
  0a57  bmi $0a51        17155    89.22%  #############################################
  0a59  tya                994    89.42%  #############################################
  0a5a  adc $06           1491    89.70%  #############################################
  0a5c  sta $06           1491    89.99%  #############################################
  0a5e  bcc $0a63         1463    90.27%  #############################################
  0a60  inc $07            140    90.30%  #############################################
  0a62  clc                 56    90.31%  #############################################
  0a63  tya                994    90.50%  #############################################
  0a64  adc $04           1491    90.79%  #############################################
  0a66  sta $04           1491    91.07%  ##############################################
  0a68  bcc $0a6c         1464    91.35%  ##############################################
  0a6a  inc $05            135    91.38%  ##############################################
  0a6c  ldy #$00           994    91.57%  ##############################################
  0a6e  lax ($06),y       2485    92.05%  ##############################################
  0a70  bpl $0a1a         1325    92.30%  ##############################################
  0a72  asl                332    92.37%  ##############################################
  0a73  bpl $0a51          340    92.43%  ##############################################
  0a75  txa               1308    92.68%  ##############################################
  0a76  adc $04           1962    93.06%  ###############################################
  0a78  sta $08           1962    93.44%  ###############################################
  0a7a  lda #$ff          1308    93.69%  ###############################################
  0a7c  adc $05           1962    94.07%  ###############################################
  0a7e  sta $09           1962    94.44%  ###############################################
  0a80  iny               1308    94.70%  ###############################################
  0a81  lda ($08),y       3270    95.32%  ################################################
  0a83  sta ($04),y       3924    96.08%  ################################################
  0a85  iny               1308    96.33%  ################################################
  0a86  lda ($08),y       3270    96.96%  ################################################
  0a88  sta ($04),y       3924    97.71%  #################################################
  0a8a  lda #$01          1308    97.97%  #################################################
  0a8c  adc $04           1962    98.34%  #################################################
  0a8e  sta $04           1962    98.72%  #################################################
  0a90  bcs $0a9b         1310    98.97%  #################################################
  0a92  inc $06           3270    99.60%  ##################################################
  0a94  bne $0a48         1959    99.98%  ##################################################
  0a96  inc $07             65    99.99%  ##################################################
  0a98  jmp $0a48           39   100.00%  ##################################################
  0a9b  inc $05             10   100.00%  ##################################################
  0a9d  bcs $0a92            6   100.00%  ##################################################
2019-02-24 10:15
Krill

Registered: Apr 2002
Posts: 2825
Quoting ChristopherJam
total cycles spent by that instruction, cumulative percentage of cycles expended, and an ASCII graph of the latter
I wonder how log2(total cycles spent by that instruction) and the corresponding ASCII graph (with or without scaling to 64x "#" max.) would look like. I imagine "number of bits to represent number of cycles" to be quite an intuitive curve. :)

Edit: Hmm, that would basically be a scaled and left-justified version of the "total cycles spent by that instruction" (base 10) column. Maybe it should be linear‽
2019-02-24 10:42
Slajerek

Registered: May 2015
Posts: 62
Interesting result of the script :)

Yes, lacking of cycle stamp in each log line is a clear overlook by me, apologies :) there's a number of cycles in each jsr/rts marker, so Champ does not need a cycle count per line. Anyway, I definitely have to add better cycle counters handling and logging.
2019-02-24 10:56
ChristopherJam

Registered: Aug 2004
Posts: 1370
Krill: Yes, I graphed cumulative to make it easy to see at a glance how costly sections of code are, but the counts do indeed make it easy to see individual expensive instructions.

five digits bad, three or four digits less so :)


Slajerek: Thanks!

Oh, and no need to apologize; much as I'm looking forward to future enhancements this is already extraordinarily useful.
2019-02-24 11:21
Krill

Registered: Apr 2002
Posts: 2825
Quoting ChristopherJam
Krill: Yes, I graphed cumulative to make it easy to see at a glance how costly sections of code are, but the counts do indeed make it easy to see individual expensive instructions.
So you're looking for big slopes (high increase of #) when head-parsing the graph line by line from top to bottom, adjusting for increasing "ratio skew"?

I mean the cumulative graph always starts at 0 and ends at 100% in a strictly ascending manner, no matter the program or per-line cycle distribution.
2019-02-24 12:06
ChristopherJam

Registered: Aug 2004
Posts: 1370
Yes, I look at the slope as a proportion of the final width, rather than current width. Perhaps I should add a right hand border, or a second/lighter fill character between the current total and 100%? (eg ####__)

But yes, even as is you can spot things like the graph going near horizontal at 0a32 and 0a54, and you can also see it going from 1.5% to 89% in the space of 40 or so instructions.
2019-02-24 12:38
oziphantom

Registered: Oct 2014
Posts: 478
does you log handle, badlines, sprite DMA, IRQs etc, does it just get tacked on to the cost of the instruction, are you able to filter out what is and isn't IRQ etc.
2019-02-24 12:44
ChristopherJam

Registered: Aug 2004
Posts: 1370
This is purely CPU cycles at the moment; I don't have information about VIC bus accesses.

I suspect the only indication I would have that an IRQ has happened would be from parsing the preceding instruction and noting that the next one executed was in an unexpected location; not sure, as I've only tested this with screen blanked and interrupts disabled.

( Slajerek? :D )
2019-02-24 16:00
Krill

Registered: Apr 2002
Posts: 2825
Enough to profile "pure" algorithms, converting input to output without side-effects. Which is all you need for now. :)
2019-02-24 16:24
Martin Piper

Registered: Nov 2007
Posts: 631
Quote: VICE PDB Monitor 1.0 (get the extra files from github as per comments ) has a limited profiler, Martin did it so I'm not a 100% sure how it works. But I think it uses the memmap operations to count how often an address is executed. So its doesn't give you clocks but it does give hotspots. Its open source so if you wish to extend it to be better, feel free ;)

Currently VICE PDB Monitor uses Vice's mmzap and mmshow between executions or breakpoints to gather execution information and generate a heat map. It's not entirely exact since Vice doesn't trap multiple executions from the same address.

I'm pondering adding some profiling syntax and command line arguments to BDD6502 to allow cycle exact profiling a source level debugging of the same.

One thing I would like to support is support for self modifying code where the opcode is changed. This would be useful for very optimised self modifying code. This does however have challenges for regular profile views because cycle counts can change when the opcode is changed.
2019-03-01 22:29
Slajerek

Registered: May 2015
Posts: 62
@oziphantom noted :) all these are not logged yet, the Champ profile reports do not take these things into account. Will add soon, thanks for pointing this out.
RefreshSubscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search   for   in  
All times are CET.
Search CSDb
Advanced
Users Online
Mixer
Higgie/Kraze/Onslaught
pcollins/Quantum
Røly/Mayday!
Falborg
A3/AFL
radius75
kbs/Pht/Lxt
Didi/Laxity
Guests online: 74
Top Demos
1 Next Level  (9.8)
2 Mojo  (9.7)
3 Coma Light 13  (9.7)
4 Edge of Disgrace  (9.6)
5 Comaland 100%  (9.6)
6 No Bounds  (9.6)
7 Uncensored  (9.6)
8 Wonderland XIV  (9.6)
9 The Ghost  (9.6)
10 Bromance  (9.6)
Top onefile Demos
1 It's More Fun to Com..  (9.9)
2 Party Elk 2  (9.7)
3 Cubic Dream  (9.6)
4 Copper Booze  (9.5)
5 Rainbow Connection  (9.5)
6 Wafer Demo  (9.5)
7 TRSAC, Gabber & Pebe..  (9.5)
8 Onscreen 5k  (9.5)
9 Dawnfall V1.1  (9.5)
10 Quadrants  (9.5)
Top Groups
1 Oxyron  (9.3)
2 Nostalgia  (9.3)
3 Booze Design  (9.3)
4 Censor Design  (9.3)
5 Crest  (9.3)
Top Fullscreen Graphicians
1 Carrion  (9.8)
2 Joe  (9.8)
3 Duce  (9.8)
4 Mirage  (9.7)
5 Facet  (9.7)

Home - Disclaimer
Copyright © No Name 2001-2024
Page generated in: 0.08 sec.