Log inRegister an accountBrowse CSDbHelp & documentationFacts & StatisticsThe forumsAvailable RSS-feeds on CSDbSupport CSDb Commodore 64 Scene Database
You are not logged in - nap
CSDb User Forums


Forums > CSDb Entries > Release id #154516 : Subsizer 0.5
2017-03-18 18:36
Compyx

Registered: Jan 2005
Posts: 631
Release id #154516 : Subsizer 0.5

Better to take this to the forum.

After adding some printf() statements to track the file open failure with "some demo.prg", the quoting of arguments works, but only sometimes. Which usually indicates a deeper problem somewhere.

Running through valgrind I get same nasty messages:
compyx@asus-p5k:~$ valgrind bin/subsizer "music demo.prg"
==25408== Memcheck, a memory error detector
==25408== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==25408== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for copyright info
==25408== Command: bin/subsizer music\ demo.prg
==25408== 
argv[0] = 'bin/subsizer'
argv[1] = 'music demo.prg'
build matches...
==25408== Warning: set address range perms: large range [0x395f8040, 0x98bd9040) (undefined)
...0.49 s
optimizing matches...
==25408== Conditional jump or move depends on uninitialised value(s)
==25408==    at 0x10B6F4: cost_enc (in /home/compyx/bin/subsizer)
==25408==    by 0x10C020: crunch_normal_int.isra.1 (in /home/compyx/bin/subsizer)
==25408==    by 0x109818: main (in /home/compyx/bin/subsizer)
==25408== 
 5264 (left 70.84%)
0000113235660150,1112,1010122424555667,222233346666789A,132223234566679C
==25408== Conditional jump or move depends on uninitialised value(s)
==25408==    at 0x4C31CC2: __memcmp_sse4_1 (vg_replace_strmem.c:1094)
==25408==    by 0x10C4B4: crunch_normal_int.isra.1 (in /home/compyx/bin/subsizer)
==25408==    by 0x109818: main (in /home/compyx/bin/subsizer)
==25408== 
==25408== Conditional jump or move depends on uninitialised value(s)
==25408==    at 0x4C31CFD: __memcmp_sse4_1 (vg_replace_strmem.c:1094)
==25408==    by 0x10C4B4: crunch_normal_int.isra.1 (in /home/compyx/bin/subsizer)
==25408==    by 0x109818: main (in /home/compyx/bin/subsizer)
==25408== 
==25408== Conditional jump or move depends on uninitialised value(s)
==25408==    at 0x10C4B7: crunch_normal_int.isra.1 (in /home/compyx/bin/subsizer)
==25408==    by 0x109818: main (in /home/compyx/bin/subsizer)
==25408== 
 4899 (left 65.93%)
0000113235660150,1112,1010122332456667,222323346666789A,132223234566679C
 4897 (left 65.90%)
0000113235660150,1112,1010122332456667,222323346666789A,132223234566679C
==25408== Conditional jump or move depends on uninitialised value(s)
==25408==    at 0x4C31CD6: __memcmp_sse4_1 (vg_replace_strmem.c:1094)
==25408==    by 0x10C4B4: crunch_normal_int.isra.1 (in /home/compyx/bin/subsizer)
==25408==    by 0x109818: main (in /home/compyx/bin/subsizer)
==25408== 
...2.02 s
==25408== Warning: set address range perms: large range [0x395f8028, 0x98bd9058) (noaccess)
generating output...
...0.02 s
packed 7431 bytes into 4897 bytes
verifed 7431 bytes...ok
==25408== 
==25408== HEAP SUMMARY:
==25408==     in use at exit: 0 bytes in 0 blocks
==25408==   total heap usage: 101 allocs, 101 frees, 1,624,666,687 bytes allocated
==25408== 
==25408== All heap blocks were freed -- no leaks are possible
==25408== 
==25408== For counts of detected and suppressed errors, rerun with: -v
==25408== Use --track-origins=yes to see where uninitialised values come from
==25408== ERROR SUMMARY: 6 errors from 5 contexts (suppressed: 0 from 0)


I tried using --track-origins=yes, but that completely borked my system, that filled my memory and my swap space and made my system completely unresponsive, had to kill valgrind via a tty.


Is there bug a tracker somewhere for this?
2017-03-18 19:01
tlr

Registered: Sep 2003
Posts: 1791
Thanks for the report (you and ian)! No bug tracker at the moment but problem noted. There is some parsing of the filename to find ',' and '@' notations for load address and the likes. I guess the problem is there.

Stay away from filenames with spaces in them for the time being.
2017-03-18 19:20
Compyx

Registered: Jan 2005
Posts: 631
The spaces in the file names is a minor issue, I'm more worried about the uninitialized values, and you seem to be allocating huge chunks of memory somehow (might be related).

Happy hacking :)
2017-03-18 19:25
tlr

Registered: Sep 2003
Posts: 1791
Try this:
index 6012af5..55f85ab 100644
--- a/crunch/subsizer/src/crunch_normal.c
+++ b/crunch/subsizer/src/crunch_normal.c
@@ -485,7 +485,7 @@ static int crunch_normal_int(Buffer *sbf, Buffer *dbf, int flags)
     MatchTree *mt = create_matchtree();
     double t1, t2;
     EncodingSet es;
-
+    memset(&es, 0, sizeof(EncodingSet));
 
     msg(MSG_VERBOSE, "build matches...\n");
     t1 = get_time();
2017-03-18 19:36
soci

Registered: Sep 2003
Posts: 481
Sure, that's a good solution for the garbage in structure holes which could affect memcmp at crunch_normal.c:402.

But it also hides the issue that endm is not initialized in optimize_tree like the rest of the fields.
2017-03-18 19:39
tlr

Registered: Sep 2003
Posts: 1791
I know it's ugly, but basically es gets memcpy'd to last_es for comparison so any uninitialized values are compared to themselves. It shouldn't (tm) affect operation.

I'll try to do something nicer in the future though.

Btw, I haven't really been able to reproduce anything special related to spaces in the filename. Any clues?
2017-03-18 19:41
Compyx

Registered: Jan 2005
Posts: 631
That didn't help. Maybe tomorrow when I have more time (hacking on my own stuff and VICE), I'll try to figure out where the uninitialized stuff comes from.
2017-03-18 19:46
tlr

Registered: Sep 2003
Posts: 1791
Ok, thanks. The valgrind errors completely disappeared on my test case here using this. Maybe it depends on the input file... Running Ubuntu 14.04 64-bit on an i5 here. I'd love a reproducible test case. PM if you like.
2017-03-18 19:56
Compyx

Registered: Jan 2005
Posts: 631
I simply used this: http://csdb.dk/release/download.php?id=191285, since it was something I downloaded recently and was small.

The filename stuff happens randomly, when I added some printf()'s to see what was going on, it disappeared for a while, but then came back. So that might indicate stack corruption somewhere.

But so far, that file has given me consistent errors with valgrind. I'm running Debian Stretch 64-bit here, on an old Core Duo box. GCC is 6.3.0, valgrind is 3.12.0.SVN.
2017-03-18 20:03
tlr

Registered: Sep 2003
Posts: 1791
Gaah... Maybe I shouldn't have done as many kludges when coding this. :/

I'm sure there is some stack corruption problem but not easily reproducible here unfortunately. Will clean up some nastyness for the next release.
tlr@pinecone:subsizer$ valgrind subsizer -x "tests/Ninth.prg"
==13935== Memcheck, a memory error detector
==13935== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==13935== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==13935== Command: subsizer -x tests/Ninth.prg
==13935== 
read 'tests/Ninth.prg' $0801-$7400.
detected sys: $4000
build matches...
==13935== Warning: set address range perms: large range [0x3a048040, 0x99629040) (undefined)
...6.48 s
optimizing matches...
 7139 (left 25.82%)
1110334466621000,1012,0000001022345657,02112344560678AD,02103434555789AF
 6677 (left 24.15%)
1110334465462410,1012,0000001022345657,02112344560678AD,02103434555789AF
 6682 (left 24.17%)
1110334465462410,1012,0000001022345657,02112344560678AD,02103434555789AF
...6.69 s
==13935== Warning: set address range perms: large range [0x3a048028, 0x99629058) (noaccess)
generating output...
...0.02 s
safe = 2
packed 27647 bytes (109 blocks) into 6974 bytes (28 blocks)
==13935== 
==13935== HEAP SUMMARY:
==13935==     in use at exit: 0 bytes in 0 blocks
==13935==   total heap usage: 100 allocs, 100 frees, 1,624,919,704 bytes allocated
==13935== 
==13935== All heap blocks were freed -- no leaks are possible
==13935== 
==13935== For counts of detected and suppressed errors, rerun with: -v
==13935== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
tlr@pinecone:subsizer$ 
2017-03-18 20:09
soci

Registered: Sep 2003
Posts: 481
Works fine here, also for spaced filenames. Similar system but 32 bit so I had to adjust the following hack otherwise the allocation didn't succeed with valgrind:
safe_malloc(200000000 * sizeof(Match), "matches");
2017-03-18 20:15
tlr

Registered: Sep 2003
Posts: 1791
Interesting! What version of gcc?

I have:
tlr@pinecone:subsizer$ gcc --version
gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

tlr@pinecone:subsizer$
2017-03-18 20:23
soci

Registered: Sep 2003
Posts: 481
gcc (Debian 6.3.0-8) 6.3.0 20170221

There are some harmless warnings with it:
fold.c: In function ‘fold’:
fold.c:339:12: warning: unused variable ‘n’ [-Wunused-variable]
     size_t n = low_limit - sa;
            ^
At top level:
fold.c:142:13: warning: ‘dump_rle_entries’ defined but not used [-Wunused-function]
 static void dump_rle_entries(void)
             ^~~~~~~~~~~~~~~~
In file included from fold.c:15:0:
decrunchers.h:277:18: warning: ‘fs_list_header’ defined but not used [-Wunused-variable]
 static FixStruct fs_list_header[] = {
                  ^~~~~~~~~~~~~~
decrunchers.h:270:18: warning: ‘fs_list_tail’ defined but not used [-Wunused-variable]
 static FixStruct fs_list_tail[] = {
                  ^~~~~~~~~~~~
decrunchers.h:263:18: warning: ‘fs_list_decruncher’ defined but not used [-Wunused-variable]
 static FixStruct fs_list_decruncher[] = {
                  ^~~~~~~~~~~~~~~~~~
2017-03-21 18:51
tlr

Registered: Sep 2003
Posts: 1791
update: Subsizer 0.5.1
subsizer 0.5.1, 2017-03-21
  - added LICENSE.txt
  - give error if no end marker can be found.
  - got rid of a few warnings as reported by soci.
  - fixed issue with uninitialized memory as reported by compyx.
  - added decruncher source due to popular demand.  well, only bitbreaker. :)
    (caution: encoding may change between versions)
2017-03-24 04:01
ChristopherJam

Registered: Aug 2004
Posts: 1409
Has anyone tested this on the Pearl for Pigs corpus?

I'm writing a benchmarks page at codebase ( http://www.codebase64.org/doku.php?id=base:compression_benchmar.. ), and would be interesting to see where Subsizer fits in the big picture

I'll add a scatter plot soonish.
2017-03-24 06:35
tlr

Registered: Sep 2003
Posts: 1791
I have these for subsizer 0.5:
    file           size  (blks)   left    gain   time   mem
    -----------------------------------------------------------
        pfp1.bin:  2956 (12)     26.85%  73.15%  0.34s  30.05M
        pfp2.bin:  2205 (9)      44.34%  55.66%  0.10s  27.47M
        pfp3.bin:  1788 (8)      45.28%  54.72%  0.08s  25.08M
        pfp4.bin:  3456 (14)     49.26%  50.74%  0.16s  25.46M
        pfp5.bin:  19519 (77)    56.15%  43.85%  1.23s  74.61M
        pfp6.bin:  8396 (34)     26.57%  73.43%  1.33s  83.25M
        pfp7.bin:  8766 (35)     42.99%  57.01%  0.51s  32.22M
        pfp8.bin:  3063 (13)     53.61%  46.39%  0.15s  24.89M
        pfp9.bin:  5307 (21)     59.23%  40.77%  0.14s  25.58M
    -----------------------------------------------------------
No decruncher speeds as there is no standalone decruncher yet.

Curious question: are we measuring those in cycles without badlines?
2017-03-24 06:44
ChristopherJam

Registered: Aug 2004
Posts: 1409
Quoting tlr
I have these for subsizer 0.5:
…
No decruncher speeds as there is no standalone decruncher yet.

Curious question: are we measuring those in cycles without badlines?


Thanks! Yes, time in cycles with neither badlines nor interrupts, either by using CIA or breakpoints+cyclecounter in VICE.
2017-03-24 07:53
ChristopherJam

Registered: Aug 2004
Posts: 1409
Also, wow - those are tiny!

Even smaller than exomizer. I really need to time those two...
2017-03-24 10:57
Frantic

Registered: Mar 2003
Posts: 1648
Well, if you call yourself The Leader, you need to be second to none!
2017-03-27 07:21
ChristopherJam

Registered: Aug 2004
Posts: 1409
Scatter plot's up at http://www.codebase64.org/doku.php?id=base:compression_benchmar.. :)
2017-03-28 08:56
tlr

Registered: Sep 2003
Posts: 1791
I wanted to replicate some of the results as a baseline but can't really get the same values.

This is what I get for lzwvl-f:
        name       size          left    cycles  frm  spd     cons
        pfp1.bin:  4529 (18)     41.14%  234334 11.9 45.2k/s 51.7c/b
        pfp2.bin:  3532 (14)     71.02%  127774 6.5  37.4k/s 36.2c/b
        pfp3.bin:  2991 (12)     75.74%  95420  4.9  39.8k/s 31.9c/b
        pfp4.bin:  4242 (17)     60.46%  153715 7.8  43.9k/s 36.2c/b
        pfp5.bin:  25781 (102)   74.17%  730916 37.2 45.8k/s 28.4c/b
        pfp6.bin:  11283 (45)    35.70%  644479 32.8 47.2k/s 57.1c/b
        pfp7.bin:  12108 (48)    59.38%  466960 23.8 42.0k/s 38.6c/b
        pfp8.bin:  4179 (17)     73.15%  142155 7.2  38.7k/s 34.0c/b
        pfp9.bin:  6914 (28)     77.17%  226119 11.5 38.1k/s 32.7c/b
The cycle count is slightly off, presumably due to obscure timing model errors in unp64 which I'm using to emulate the depacker. I verified the timing in vice for a few and the error for those was about 1% or so.

Note especially the difference between pfp6.bin and pfp7.bin.

The last value is cycles per byte _consumed_ as in the tables. Is this really a useful measure?
2017-04-06 17:56
tlr

Registered: Sep 2003
Posts: 1791
Teaser:
        name       size          left    cycles  frm  spd     cons
        pfp1.bin:  2961 (12)     26.90%  751339  38.2 14.1k/s 253.7c/b
        pfp2.bin:  2201 (9)      44.26%  432794  22.0 11.1k/s 196.6c/b
        pfp3.bin:  1786 (8)      45.23%  321693  16.4 11.8k/s 180.1c/b
        pfp4.bin:  3438 (14)     49.00%  638705  32.5 10.6k/s 185.8c/b
        pfp5.bin:  19631 (78)    56.48%  3642024 185.3 9.2k/s 185.5c/b
        pfp6.bin:  8407 (34)     26.60%  1864843 94.9 16.3k/s 221.8c/b
        pfp7.bin:  8768 (35)     43.00%  1725244 87.8 11.4k/s 196.8c/b
        pfp8.bin:  3086 (13)     54.02%  520012  26.5 10.6k/s 168.5c/b
        pfp9.bin:  5313 (21)     59.30%  973471  49.5 8.9k/s  183.2c/b

There are many optimizations left to be done. It will use a temporary buffer of slightly less than a page though. This can of course be overwritten between decrunches.
2017-04-07 22:20
ChristopherJam

Registered: Aug 2004
Posts: 1409
Ooh, nice. Already considerably faster than Exomizer, which is the only other cruncher in the same ballpark for ratio.

I was thinking cycles per byte consumed might be a useful metric for impedance matching with fastloaders, especially if you're streaming.
2017-04-08 04:32
tlr

Registered: Sep 2003
Posts: 1791
Quoting ChristopherJam
I was thinking cycles per byte consumed might be a useful metric for impedance matching with fastloaders, especially if you're streaming.

Good point!

I'll be away for a week+ now so will be starting with a fresh mind then and see what I can come up with.
2017-04-21 07:42
ChristopherJam

Registered: Aug 2004
Posts: 1409
I've updated the scatter plot and tables at codebase.


Article here if that image link fails.
2017-04-21 07:53
j0x

Registered: Mar 2004
Posts: 215
Interesting! Have you considered adding benchmark data from the Card Cruncher or other tools based purely on entropy encoding?
2017-04-21 08:44
ChristopherJam

Registered: Aug 2004
Posts: 1409
j0x, submissions are welcome! I just need the compressed sizes and cycle times for decompression for each of the nine files..
2017-04-21 18:24
tlr

Registered: Sep 2003
Posts: 1791
new release: Subsizer 0.6
subsizer 0.6, 2017-04-21
  - improved first pass cost model
  - cleaned up verbose output a bit
  - saved 10 bytes in the dirty sfx decruncher
  - added stand alone decruncher source

Sorry for releasing it right after you updated the tables. ;)
2017-04-22 15:52
tlr

Registered: Sep 2003
Posts: 1791
pfp-stats for 0.6:
                                        duration      outspd    inspd
     file       size  (blks)   left   cycles  frms  k/s   cy/b  cy/b
    ------------------------------------------------------------------
     pfp1.bin:  2961 (12)     26.90%  724217  36.8  14.6  65.8  244.6
     pfp2.bin:  2201 (9)      44.26%  414274  21.1  11.5  83.3  188.2
     pfp3.bin:  1786 (8)      45.23%  308675  15.7  12.3  78.2  172.8
     pfp4.bin:  3438 (14)     49.00%  613097  31.2  11.0  87.4  178.3
     pfp5.bin:  19631 (78)    56.48%  3497850 178.0 9.6   100.6 178.2
     pfp6.bin:  8407 (34)     26.60%  1803984 91.8  16.9  57.1  214.6
     pfp7.bin:  8768 (35)     43.00%  1661112 84.5  11.8  81.5  189.5
     pfp8.bin:  3086 (13)     54.02%  500954  25.5  11.0  87.7  162.3
     pfp9.bin:  5313 (21)     59.30%  934362  47.5  9.2   104.3 175.9
    ------------------------------------------------------------------
RefreshSubscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search   for   in  
All times are CET.
Search CSDb
Advanced
Users Online
goerp/F4CG/HF
BYB/Hokuto Force
A3/AFL
Yogibear/Protovision
Toggle/Padua
teloni0
Krill/Plush
Knobby/Role
Guests online: 122
Top Demos
1 Next Level  (9.7)
2 13:37  (9.7)
3 Coma Light 13  (9.6)
4 Edge of Disgrace  (9.6)
5 Mojo  (9.6)
6 Uncensored  (9.6)
7 The Demo Coder  (9.6)
8 Comaland 100%  (9.6)
9 What Is The Matrix 2  (9.6)
10 Wonderland XIV  (9.6)
Top onefile Demos
1 Layers  (9.7)
2 Cubic Dream  (9.6)
3 Party Elk 2  (9.6)
4 Copper Booze  (9.6)
5 Dawnfall V1.1  (9.5)
6 Rainbow Connection  (9.5)
7 Morph  (9.5)
8 Libertongo  (9.5)
9 Onscreen 5k  (9.5)
10 It's More Fun to Com..  (9.5)
Top Groups
1 Booze Design  (9.3)
2 Oxyron  (9.3)
3 Performers  (9.3)
4 Triad  (9.3)
5 Censor Design  (9.3)
Top Crackers
1 Mr. Z  (9.9)
2 OTD  (9.8)
3 Antitrack  (9.8)
4 Fungus  (9.8)
5 S!R  (9.8)

Home - Disclaimer
Copyright © No Name 2001-2025
Page generated in: 0.147 sec.