
Does anybody here have any thoughts on creating a racket lang that compiles straight into a gcc plugin/assembler/llvm ir for performance? This would be a subset of racket optimized for performance. How far could one go until hitting hard translation problems?

@pocmatos I’ve considered it. The GC is definitely getting in my way atm :neutral_face:, even in incremental mode. Maybe compile to C++ as a backend or something for a first iteration. Unfortunately build times can be pretty high since racket has a long startup time.

If we can get incremental builds going using ninja or something it could be nice, especially if we allow all forms of metaprogramming (hello enums with automatic conversion to strings)

@macocio don’t care so much about build times. runtime is king in my world.

I had a popcount implemented in Racket which statistical profiling showed to be taking 12% of my runtime. x86_64 and other archs can do it in a single instruction. A ffi call away and I lowered that to less that 0.2%. Quite surprised that there’s not yet a function doing this in Racket.

@pocmatos Apropos popcount: https://docs.racket-lang.org/gmp/index.html#%28def._%28%28lib._gmp%2Fmain..rkt%29._mpz_popcount%29%29

@soegaard2 true, I noticed that but I have fixnums?, that expects mpz? and I don’t want to constantly allocate an mpz for the calculation. I was expecting something like fxpopcount for example. I will create a simple package for this highly optimized functions.

I never distributed a package online with C code. When I do so, is there a way to get the C code compiled on the target machine so I can use -march=native
?

i.e. compile the C code when the user does raco pkg install ...
?

I’m pretty late to the game here, but it seems all the other benchmarks run multiple parallel processes per server in order to handle the requests, whereas your example only uses one process (thereby one os thread) to handle the load.

Oh wait, nvm, I see there are two sets of benchmarks: one multiprocess and the other single process, but the Racket benchmark seems to be grouped with the multiprocess ones.

@pocmatos How does it compare to https://github.com/racket/racket/blob/master/racket/collects/data/private/count-bits-in-fixnum.rkt

@soegaard2 woot? How did you find that gem and how come it’s not provided by racket? Curious to do some benchmarking… will come back to you on this.

It’s used in data/bit-vector which I wrote. I can’t remember whether I am responsible for the pop count code though. There were a discussion on the mailing list at some point.


Note that the original implementation of bit-vectors used fxvector and the current one uses bytes. I remember there were to change at some point to use 32 bit fixnums also at 64 bit machines. So if the current fxpopcount only handles 32 bit fixnums, look at an older version.

Thanks.

So the C version is about 2% faster, which is quite surprising given that in C it’s a single cpu instruction. But, of course, the call to C might also introduce some cruft.

! That’s better than I thought.

@pocmatos I’d be surprised if nobody looked at an LLVM backend for racket tbh

but you all would know better than I