It looks like a lot of the extra allocation is in immutable-hash iteration – and that it’s not a problem, because allocating is faster than an implementation that doesn’t allocate (and it’s faster than a non-allocating Racket traversal, anyway)
creating immutable hashes seems to allocate vastly more, and be much slower, than in racket
that program is .8 sec in racket, 1.8 sec in chez, and allocates about 5x as much
allocating empty immutable hashes seems to be faster in chez, though
size 3 immutable hashes seem to take about 2x longer to create in chez
size 3 hashes seem to allocate about 2x more in chez as well, so that’s probably the source of the time
optimize-level 3
does not seem to change anything
Thanks! My tests had been using mostly integers as keys, and there’s not such a big difference in that case
ah, interesting
The issue may have more to do with hashing on gensyms
It think Chez will reify the gensym’s uuid to generate a hash code for it
yeah, size 3 tables with (random 100)
as the key are about 2x faster in chez
I think the expander uses gensyms frequently as a cheap hashable allocation, and maybe it’s not so cheap…
integer keys are consistently faster for me in chez than in racket, for both large and small tables
(in that benchmark)
also, using (optimize-level 3)
for core.so
makes about a 20% difference in that benchmark
for large tables
I’ve been seeing 20% for almost anything with (optimize-level 3)
No, maybe that’s not right; I think it wasn’t such a big effect for the expander overall
for the small tables the difference was not so big
also, your hypothesis about uuid generation would explain why doing the work of hash-set
twice didn’t change the expander’s performance
Yes, good point. I have meetings, now, but I’ll look into gensym hashing later this morning
hashing a gensym in chez is 6x slower than in racket, hashing a regular symbol is about 4x faster than in racket (although some of that may be the loop itself being faster)
Avoiding symbol-hash
on gensyms didn’t change all that much for the expander, but it helped the expanded->linklet->Scheme pipeline some
The reason why hashing a gensym in Chez is slower might be that hashing forces Chez to actually generate a name for the symbol. If I recall correctly Chez represents a gensym as a simple counter until the name of the symbol is used.
The new code is faster but still about 40% slower than Racket
For some reason, I can’t get the performance-region logging to work right under chez
but some manual logging of timings suggests that sort
takes a particularly long time to expand, unsurprisingly
also reqprov
is a suprising one
here are the most expensive modules, in ms:
108 real time: 108 gc time: 42 stxparam
139 real time: 139 gc time: 27 stxcase
144 real time: 144 gc time: 15 module-reader
214 real time: 214 gc time: 52 string
234 real time: 234 gc time: 39 letstx-scheme
359 real time: 359 gc time: 49 reqprov
409 real time: 409 gc time: 83 define
423 real time: 423 gc time: 65 define-struct
423 real time: 423 gc time: 92 kw
440 real time: 440 gc time: 24 sort
625 real time: 625 gc time: 56 case
689 real time: 689 gc time: 82 for
1282 real time: 1283 gc time: 190 more-scheme
4518 real time: 4518 gc time: 976 pre-base
5154 real time: 5155 gc time: 1088 private/base
5369 real time: 5369 gc time: 1112 base
@mflatt with your immutable changes, the expander no longer loads, because it fails when running a vector-copy!
on an immutable vector
In retrospect, Gustavo warned me that Chez has a single empty vector, so vector-copy!
can’t distinguish mutable and immutable empty vectors
that was my initial thought, but I haven’t confirmed that yet
I pushed a repair
oh, and I managed to get the performance stats
now trying to get them to print on racket
REGION MSECS MEMK COUNT
eval 2770.78 3,203,449 0
module 1538.57 1,652,529 95
transformer 945.39 1,353,317 446
requires 216.30 167,396 58301
instantiate 69.37 28,996 3415
top 1.13 1,209 5
expand 2308.32 1,189,938 0
transformer 1139.55 587,432 2207
macro 500.38 276,874 6742
form-in-module/2 886.57 478,570 651
macro 322.77 181,776 10107
require 76.42 41,141 504
form-in-module/1 59.03 25,234 1457
macro 44.28 22,718 945
provide 50.70 21,744 82
macro 23.12 12,421 9
module-begin 40.69 14,359 166
macro 2.78 1,628 25
local-expand 36.47 16,077 208
macro 14.58 8,078 290
module 17.27 4,974 69
scopes 2.05 371 164
top 1.59 405 298
compile 287.00 263,949 0
transformer 144.09 128,788 446
linklet 57.24 80,301 446
module 142.67 135,062 82
linklet 93.47 108,472 169
top 0.22 97 5
linklet 0.12 62 5
read 88.85 52,100 138
That’s for racket/base
To get performance region logging, you have to turn it on in “performance.rkt”, rebuild the linklet, change log-level?
in “expander-compat.scm” to always return #t
, and evaluate (flush)
… which I guess you figured out, but there for the record for now
right, it was the last step I missed
those numbers look like mine, so good
I’m correctly working on transferring Racket srclocs to Chez annotations in schemify
should just running expander/run.rkt
with the change to performance.rkt
print statistics?
because it doesn’t seem to for me
No, you have to enable debug@performance
logging
ah, silly me
here’s what I get on my machine for racket run.rkt -e -l racket/base
performance: REGION MSECS MEMK COUNT
performance: expand 2220.62 493,716 0
performance: transformer 1026.65 219,126 2206
performance: macro 510.65 100,063 6742
performance: form-in-module/2 825.33 190,354 657
performance: macro 322.18 77,980 10181
performance: local-expand 107.33 9,125 219
performance: macro 91.85 4,251 303
performance: provide 86.87 25,521 84
performance: macro 36.54 14,532 10
performance: require 61.00 19,874 500
performance: form-in-module/1 51.83 10,845 1478
performance: macro 41.03 9,679 956
performance: module-begin 42.88 11,820 170
performance: macro 3.63 1,035 30
performance: module 15.49 6,223 68
performance: scopes 1.97 362 168
performance: top 3.20 824 270
performance:
performance: compile 445.20 185,783 0
performance: module 296.65 128,680 83
performance: linklet 250.86 116,206 168
performance: transformer 148.55 57,102 445
performance: linklet 103.12 42,792 445
performance:
performance: eval 181.66 70,503 0
performance: requires 106.49 50,727 57502
performance: instantiate 67.25 18,442 3457
performance: transformer 5.16 983 445
performance: module 2.74 350 96
performance:
performance: read 173.21 57,202 68
interesting that the compile
step is much faster on chez
I assume that the eval
step is dominated by Chez compile time
Yes. Chez compilation is currently in the “eval” phase, but I expect it to later move to the “compile” phase
Does the compile phase have extra work in Racket relative to Chez?