mflatt
2017-3-2 14:35:30

It looks like a lot of the extra allocation is in immutable-hash iteration – and that it’s not a problem, because allocating is faster than an implementation that doesn’t allocate (and it’s faster than a non-allocating Racket traversal, anyway)


samth
2017-3-2 14:50:32

creating immutable hashes seems to allocate vastly more, and be much slower, than in racket


samth
2017-3-2 14:50:57

samth
2017-3-2 14:51:19

that program is .8 sec in racket, 1.8 sec in chez, and allocates about 5x as much


samth
2017-3-2 14:52:21

allocating empty immutable hashes seems to be faster in chez, though


samth
2017-3-2 14:56:11

size 3 immutable hashes seem to take about 2x longer to create in chez


samth
2017-3-2 14:59:35

size 3 hashes seem to allocate about 2x more in chez as well, so that’s probably the source of the time


samth
2017-3-2 15:32:28

optimize-level 3 does not seem to change anything


mflatt
2017-3-2 15:32:47

Thanks! My tests had been using mostly integers as keys, and there’s not such a big difference in that case


samth
2017-3-2 15:33:04

ah, interesting


mflatt
2017-3-2 15:33:05

The issue may have more to do with hashing on gensyms


mflatt
2017-3-2 15:33:37

It think Chez will reify the gensym’s uuid to generate a hash code for it


samth
2017-3-2 15:35:13

yeah, size 3 tables with (random 100) as the key are about 2x faster in chez


mflatt
2017-3-2 15:35:15

I think the expander uses gensyms frequently as a cheap hashable allocation, and maybe it’s not so cheap…


samth
2017-3-2 15:38:36

integer keys are consistently faster for me in chez than in racket, for both large and small tables


samth
2017-3-2 15:38:41

(in that benchmark)


samth
2017-3-2 15:39:14

also, using (optimize-level 3) for core.so makes about a 20% difference in that benchmark


samth
2017-3-2 15:39:21

for large tables


mflatt
2017-3-2 15:40:12

I’ve been seeing 20% for almost anything with (optimize-level 3)


mflatt
2017-3-2 15:40:59

No, maybe that’s not right; I think it wasn’t such a big effect for the expander overall


samth
2017-3-2 15:42:23

for the small tables the difference was not so big


samth
2017-3-2 15:55:15

also, your hypothesis about uuid generation would explain why doing the work of hash-set twice didn’t change the expander’s performance


mflatt
2017-3-2 16:03:20

Yes, good point. I have meetings, now, but I’ll look into gensym hashing later this morning


samth
2017-3-2 16:09:53

hashing a gensym in chez is 6x slower than in racket, hashing a regular symbol is about 4x faster than in racket (although some of that may be the loop itself being faster)


mflatt
2017-3-2 17:50:01

Avoiding symbol-hash on gensyms didn’t change all that much for the expander, but it helped the expanded->linklet->Scheme pipeline some


soegaard2
2017-3-2 18:30:37

The reason why hashing a gensym in Chez is slower might be that hashing forces Chez to actually generate a name for the symbol. If I recall correctly Chez represents a gensym as a simple counter until the name of the symbol is used.


samth
2017-3-2 21:36:23

The new code is faster but still about 40% slower than Racket


samth
2017-3-2 23:07:55

For some reason, I can’t get the performance-region logging to work right under chez


samth
2017-3-2 23:08:18

but some manual logging of timings suggests that sort takes a particularly long time to expand, unsurprisingly


samth
2017-3-2 23:08:33

also reqprov is a suprising one


samth
2017-3-2 23:40:44

here are the most expensive modules, in ms:


samth
2017-3-2 23:40:59
108 real time: 108 gc time: 42 stxparam
139 real time: 139 gc time: 27 stxcase
144 real time: 144 gc time: 15 module-reader
214 real time: 214 gc time: 52 string
234 real time: 234 gc time: 39 letstx-scheme
359 real time: 359 gc time: 49 reqprov
409 real time: 409 gc time: 83 define
423 real time: 423 gc time: 65 define-struct
423 real time: 423 gc time: 92 kw
440 real time: 440 gc time: 24 sort
625 real time: 625 gc time: 56 case
689 real time: 689 gc time: 82 for
1282 real time: 1283 gc time: 190  more-scheme
4518 real time: 4518 gc time: 976 pre-base
5154 real time: 5155 gc time: 1088 private/base
5369 real time: 5369 gc time: 1112 base

samth
2017-3-2 23:54:03

@mflatt with your immutable changes, the expander no longer loads, because it fails when running a vector-copy! on an immutable vector


mflatt
2017-3-3 00:32:42

In retrospect, Gustavo warned me that Chez has a single empty vector, so vector-copy! can’t distinguish mutable and immutable empty vectors


samth
2017-3-3 00:33:54

that was my initial thought, but I haven’t confirmed that yet


mflatt
2017-3-3 00:35:42

I pushed a repair


samth
2017-3-3 00:36:04

oh, and I managed to get the performance stats


samth
2017-3-3 00:36:20

now trying to get them to print on racket


samth
2017-3-3 00:36:42
REGION                    MSECS        MEMK   COUNT
  eval                  2770.78   3,203,449       0
    module              1538.57   1,652,529      95
    transformer          945.39   1,353,317     446
    requires             216.30     167,396   58301
    instantiate           69.37      28,996    3415
    top                    1.13       1,209       5

  expand                2308.32   1,189,938       0
    transformer         1139.55     587,432    2207
      macro              500.38     276,874    6742
    form-in-module/2     886.57     478,570     651
      macro              322.77     181,776   10107
    require               76.42      41,141     504
    form-in-module/1      59.03      25,234    1457
      macro               44.28      22,718     945
    provide               50.70      21,744      82
      macro               23.12      12,421       9
    module-begin          40.69      14,359     166
      macro                2.78       1,628      25
    local-expand          36.47      16,077     208
      macro               14.58       8,078     290
    module                17.27       4,974      69
      scopes               2.05         371     164
    top                    1.59         405     298

  compile                287.00     263,949       0
    transformer          144.09     128,788     446
      linklet             57.24      80,301     446
    module               142.67     135,062      82
      linklet             93.47     108,472     169
    top                    0.22          97       5
      linklet              0.12          62       5

  read                    88.85      52,100     138

samth
2017-3-3 00:36:50

That’s for racket/base


mflatt
2017-3-3 00:37:00

To get performance region logging, you have to turn it on in “performance.rkt”, rebuild the linklet, change log-level? in “expander-compat.scm” to always return #t, and evaluate (flush)


mflatt
2017-3-3 00:37:08

… which I guess you figured out, but there for the record for now


samth
2017-3-3 00:37:11

right, it was the last step I missed


mflatt
2017-3-3 00:37:25

those numbers look like mine, so good


mflatt
2017-3-3 00:38:07

I’m correctly working on transferring Racket srclocs to Chez annotations in schemify


samth
2017-3-3 00:40:46

should just running expander/run.rkt with the change to performance.rkt print statistics?


samth
2017-3-3 00:40:53

because it doesn’t seem to for me


mflatt
2017-3-3 01:08:55

No, you have to enable debug@performance logging


samth
2017-3-3 01:10:09

ah, silly me


samth
2017-3-3 01:11:10

here’s what I get on my machine for racket run.rkt -e -l racket/base


samth
2017-3-3 01:11:14
performance: REGION                    MSECS      MEMK   COUNT
performance:   expand                2220.62   493,716       0
performance:     transformer         1026.65   219,126    2206
performance:       macro              510.65   100,063    6742
performance:     form-in-module/2     825.33   190,354     657
performance:       macro              322.18    77,980   10181
performance:     local-expand         107.33     9,125     219
performance:       macro               91.85     4,251     303
performance:     provide               86.87    25,521      84
performance:       macro               36.54    14,532      10
performance:     require               61.00    19,874     500
performance:     form-in-module/1      51.83    10,845    1478
performance:       macro               41.03     9,679     956
performance:     module-begin          42.88    11,820     170
performance:       macro                3.63     1,035      30
performance:     module                15.49     6,223      68
performance:       scopes               1.97       362     168
performance:     top                    3.20       824     270
performance:
performance:   compile                445.20   185,783       0
performance:     module               296.65   128,680      83
performance:       linklet            250.86   116,206     168
performance:     transformer          148.55    57,102     445
performance:       linklet            103.12    42,792     445
performance:
performance:   eval                   181.66    70,503       0
performance:     requires             106.49    50,727   57502
performance:     instantiate           67.25    18,442    3457
performance:     transformer            5.16       983     445
performance:     module                 2.74       350      96
performance:
performance:   read                   173.21    57,202      68

samth
2017-3-3 01:11:55

interesting that the compile step is much faster on chez


samth
2017-3-3 01:12:35

I assume that the eval step is dominated by Chez compile time


mflatt
2017-3-3 01:43:09

Yes. Chez compilation is currently in the “eval” phase, but I expect it to later move to the “compile” phase


samth
2017-3-3 03:16:48

Does the compile phase have extra work in Racket relative to Chez?