@shu—hung has joined the channel
@mflatt turns out it was simpler than I expected, unless something went unexpectedly wrong
sadly a lot of the numbers changed, so the diff is big
next up: all the streams stuff
@samth The regexp benchmark is now working
;; Racket ;; real 3m13.797s ;; user 3m13.124s ;; sys 0m0.768s
;; Pycket ;; real 4m57.301s ;; user 4m56.440s ;; sys 0m0.632s
@sabauma well done, but also :disappointed:
The upshot is that the we now support way more regexp features.
wait how does that work?
samth: Because our builtin support regexps is rather spotty.
Though quite fast.
@samth your diff makes sense, and I’ll merge sometime soon
ah, you mean that if you run regexps in pycket via the implementation in Racket that @mflatt wrote, we support more features
@samth: Precisely
but that isn’t what happens if you just use regexp-match
in pycket
I’ve mostly been looking at the regexp and port expansions to make sure they don’t use keyword support; most of the work I did recently was to make the keyword support go away when it isn’t referenced
also, presumably the reason it’s slow is the usual interpreter problem
I was hoping that there wouldn’t be an interpreter problem because the regexp implementation uses a closure compiler; does that not avoid a dispatch that causes problems?
samth: Here the JIT stats ;; Tracing: 646 0.960233 ;; Backend: 646 0.229191 ;; TOTAL: 300.922709 ;; ops: 2911425 ;; recorded ops: 403920 ;; calls: 1415 ;; guards: 89566 ;; opt ops: 134253 ;; opt guards: 32524 ;; opt guards shared: 24801 ;; forcings: 0 ;; abort: trace too long: 0 ;; abort: compiling: 0 ;; abort: vable escape: 0 ;; abort: bad loop: 0 ;; abort: force quasi-immut: 0 ;; nvirtuals: 2684 ;; nvholes: 326 ;; nvreused: 372 ;; vecopt tried: 0 ;; vecopt success: 0 ;; Total # of loops: 130 ;; Total # of bridges: 516 ;; Freed # of loops: 18 ;; Freed # of bridges: 45
We produce quite a lot of code.
@sabauma that doesn’t look so bad
is 130 loops a lot?
I am supreised at how little time is spent tracing, actually.
Depends on the size of the code.
what if you run it a second time to reduce warmup?
It is pretty branchy.
@mflatt I’m not sure where the overheads are as of yet, so its hard to say.
only 27MB of JIT logs to comb through.
The number of loops would lead me to suspect it is creating 1+ loops per regexp. Which is what we would like.
might be worth starting with a smaller version of the benchmark
Looks like we’re hitting some bad points in our implementation of mutable hash tables.
FWIW, I was thinking of branches on the regexp AST, which closure compilation avoids. But there’s also branching on the input, which might amount to the same issue for tracing.
Or, in the best case, the tracing JIT might figure out that the benchmarks applies the same regexp to the same input many times, in which case it’s not a realistic benchmark
@mflatt: The regexp implementation in the Pycket runtime is just a jitted interpreter for regular expressions, and that has pretty good performance. In theory, at least, exposing the right information to the JIT should allow us to recover that performance.
(Un)fortunately, I doubt the JIT is smart enough to recognize and eliminate repeated runs of the same regexp on the same input, since the JIT has little context to work with aside from the current trace.
@mflatt about 7% (by lines) of the compiled bootstrap linklet is the sort
implementation
does it seem worth it to try to shrink and/or remove that?
I have wondered whether it would be better to select a sort
variant statically, instead of the current dynamic approach, so that referncing sort
doesn’t mean referencing all the variants. Or have an explicit generic-sort
that is used by the expander and doesn’t refer to the specialized variants.
I was thinking of the latter approach
the main difficulty is that the definition of sort
has side effects, and so won’t be removed automatically
actually, that seems no longer true
so a generic-sort
would work
I’ll try that
@ajhager has joined the channel
@mflatt that was easier than I expected
My changes take Flattened code is ...
from about 614k to 564k
Great - thanks!
I could start writing a partial evaluator to remove some of the remaining struct definitions, but I’ll save that for another time