ryanc
2022-4-19 09:37:16

All of the other options in related functions use strings instead of characters. I’d rather write " " than #\space.


laurent.orseau
2022-4-19 09:37:47

That’s a good enough reason, thanks.


hj93
2022-4-19 13:47:02

<https://benchmarksgame-team.pages.debian.net/benchmarksgame/fastest/racket.html|racket is slower than Common Lisp (SBCL)?>


laurent.orseau
2022-4-19 13:54:20

<https://benchmarksgame-team.pages.debian.net/benchmarksgame/fastest/racket.html|Common Lisp (SBCL) is slower than Racket?>


massung
2022-4-19 13:56:22

lol. same link, two different “headlines” while technically the same meaning, but leads the reader in a different way. :smile:


massung
2022-4-19 13:56:47

Just commenting that SBCL being significantly faster than Racket shouldn’t be at all surprising.


spdegabrielle
2022-4-19 16:19:10

Why?


massung
2022-4-19 16:24:15

Natively compiled code vs. bytecode interpreting and optimizations - esp. wrt mutability in building lists.


samth
2022-4-19 16:26:46

@massung those results are for Racket CS, which doesn’t have any bytecode or interpretation overhead


spdegabrielle
2022-4-19 16:29:01

I thought racket cs was compiled?


spdegabrielle
2022-4-19 16:29:43

“incremental native-code compiler”


massung
2022-4-19 16:33:01

So, here’s a quick compare of just list building + gc…

;; racket (define (iota n) (for/list ([i n]) i)) (time (for ([i 10000]) (iota 1000))) ;; sbcl (defun iota (n) (loop for i below n collect i)) (time (dotimes (i 10000) (iota 1000))) Racket runs in 223 ms of which 118 ms gc SBCL runs in 63 ms of which 44 ms is gc


massung
2022-4-19 16:33:44

assuming racket is compiled (I don’t know that it is, but I can (disassemble iota) in SBCL), it’s just more efficient


sorawee
2022-4-19 16:34:03

Does SBCL have “contracts” or checks to guard against bad inputs?


sorawee
2022-4-19 16:34:23

I wonder how much faster Racket will be if it completely disables all contracts / checks.


samth
2022-4-19 16:34:44

I don’t disagree that SBCL usually generates faster code than Racket for a variety of reasons, but it’s definitely compiled in roughly the same way that SBCL is.


massung
2022-4-19 16:34:51

Not sure what you mean by that? Obviously SBCL raises an exception on (+ 5 "hello")


massung
2022-4-19 16:35:19

And SBCL checks types at compile-time to throw warnings


samth
2022-4-19 16:35:39

Also, if you use in-range in that for/list it’s about twice as fast.


samth
2022-4-19 16:36:01

not twice actually


samth
2022-4-19 16:36:19

but that measurement is very variable because you might or might not get a GC pause


massung
2022-4-19 16:37:18

I want to build the list intentionally. obviously not building the list is faster


samth
2022-4-19 16:37:24

SBCL also might build that list using mutation, which isn’t possible in Racket (which is one reason that it might be faster in less small benchmarks)


massung
2022-4-19 16:37:43

> SBCL also might build that list using mutation Absolutely uses mutation. Why wouldn’t you?? :slightly_smiling_face:


sorawee
2022-4-19 16:37:55

Sam meant: (define (iota n) (for/list ([i (in-range n)]) i))


samth
2022-4-19 16:38:42

No, I mean that when measuring this sort of thing, you want to avoid variance based on GC timing. So you usually want to run GC beforehand, and allocate an amount that either consistently requires a collection or consistently doesn’t.


sorawee
2022-4-19 16:39:09

In either case, there’s no list being built for iterating over n (there’s of course a list being built for collecting the values)


samth
2022-4-19 16:39:36

In Racket, because of first-class continuations you can’t build a list with mutation in the same way.


spdegabrielle
2022-4-19 16:41:02

I went looking for a disassemble function :cry:


spdegabrielle
2022-4-19 16:41:47

I can always use an external disassembler


samth
2022-4-19 16:42:29

raco pkg install disassemble


massung
2022-4-19 16:43:40

I feel like this has devolved into an “us” vs. “them” discussion, which always sucks.

The original comment was just that I fully expected SBCL to be faster than Racket. I also fully expect C/Rust to be faster than SBCL. There are obvious reasons for that: mutability in solutions, better optimizations, likely a better GC (guessing) and man-hours put into the implementation. Common Lisp is also a very fixed spec. The code that exists using it is well known and knowing what to optimize is often more important than what the optimization ends up being.


spdegabrielle
2022-4-19 16:44:03

I’m not bothered about speed. Any performance problems are far more likely due to my code. A compiler can’t fix all my sins.


samth
2022-4-19 16:44:55

I also expect SBCL to be faster than Racket, but I think the reasons are often less obvious, and a few of the ones you suggested originally were not correct.


massung
2022-4-19 16:48:18

Fair enough.


spdegabrielle
2022-4-19 16:48:24

Thanks guys. Very informative. I feel like I’ve only touched the surface.


laurent.orseau
2022-4-19 16:49:53

As long as it’s reasoned, an “us vs them” discussion can still be quite informative and may even lead to improvements :slightly_smiling_face:


spdegabrielle
2022-4-19 16:51:45

Of course now I want to know why c/rust is faster than sbcl.


samth
2022-4-19 16:51:47

samth
2022-4-19 16:52:24

@spdegabrielle (a) no GC (b) more precise type information (c) fewer safety checks (d) less need for pointers


mflatt
2022-4-19 16:55:09

It’s starting to look like part of the answer is “the build machine’s disk has gone bad,” and it just happens to have started failing the same week as the build-system change.


samth
2022-4-19 16:55:51

@massung I’d be interested in a comparison between the SBCL and Racket disassembly of that function (I don’t have SBCL handy)


massung
2022-4-19 17:11:09

Without doing a major deep-dive…

The racket disasm is performing a lot of memory accesses. This is on an M1 (ARM) so it’s a shallower pipeline that the x86, but given many sequential instructions there’s possible LHS stalls. It looks like there’s very low register usage as opposed to utilizing the stack quite a bit.

The SBCL disasm basically optimizes realizing the loop counter is a fixnum and keeps the loop value in a register. The list building all appears to be inlined as well. It may be inlined in the Racket code, but that’s difficult to discern at a quick glance given the size of the output for the racket disasm.


massung
2022-4-19 17:13:07

Again, that’s just a 5-minute glance and probably not worth taking too seriously. :wink:


massung
2022-4-19 17:18:18

That disassemble package for racket is nice. :slightly_smiling_face:


samth
2022-4-19 17:19:35

Thanks :slightly_smiling_face:


samth
2022-4-19 17:20:01

The other thing that’s notable is that the Racket code builds the list in reverse order and then calls reverse.


samth
2022-4-19 17:20:11

So if the SBCL code avoids that it’s a significant speedup


massung
2022-4-19 17:23:02

It’s hard to know if it’s very accurate, though. For example: (define (add a b) (+ a b)) (disassemble add) This produces some pretty nutty assembly that’s difficult to dissect. There’s no obvious ret instruction so it’s possible it’s outputting a block of code that’s in excess of the actual function body. There’s some code that appears to be possible type checking taking place (looks like pointer tagging?), but it’s quite a bit for just a simple + operation. More than I’d expect.


samth
2022-4-19 17:25:26

There’s no ret because it’s a tail call.


samth
2022-4-19 17:25:39

Here’s what I get: 0: 4883fd02 (cmp rbp #x2) 4: 7569 (jnz (+ rip #x69)) ; =&gt; 6f 6: 4889f9 (mov rcx rdi) 9: 4c09c1 (or rcx r8) c: f6c107 (test cl #x7) f: 750c (jnz (+ rip #xc)) ; =&gt; 1d 11: 4c89c5 (mov rbp r8) 14: 4801fd (add rbp rdi) 17: 7004 (jo (+ rip #x4)) ; =&gt; 1d 19: 41ff6500 (jmp (mem64+ r13 #x0)) 1d: 49836e6801 (sub (mem64+ r14 #x68) #x1) ; &lt;= 22: 753f (jnz (+ rip #x3f)) ; =&gt; 63 24: 4d894508 (mov (mem64+ r13 #x8) r8) 28: 49897d10 (mov (mem64+ r13 #x10) rdi) 2c: 4983c518 (add r13 #x18) 30: 488d0d20000000 (lea rcx (mem+ rip #x20)) ; =&gt; 57 37: 49894d00 (mov (mem64+ r13 #x0) rcx) 3b: e9c0fbbdf7 (jmp (+ rip #x-8420440)) ; #&lt;code event&gt; 40: 0f1f800000000088 (data) 48: 000000000000008d (data) 50: 01000000000000 (data) 57: 4983ed18 (sub r13 #x18) ; &lt;= 5b: 4d8b4508 (mov r8 (mem64+ r13 #x8)) 5f: 498b7d10 (mov rdi (mem64+ r13 #x10)) 63: e98853bdf7 (jmp (+ rip #x-842ac78)) ; #&lt;code +&gt; ; &lt;= 68: 0f1f8000000000 (data) 6f: e97cbfbbf7 (jmp (+ rip #x-8444084)) ; #&lt;code doargerr&gt; ; &lt;= 74: 0f1f8000000000 (data)


massung
2022-4-19 17:28:19

Here’s the ARM output for comparison: 0: ff0a00f1 (subs xzr x23 #x2) 4: 81100054 (b.ne (+ pc #x210)) ; =&gt; 214 8: ef0300aa (orr x15 xzr x0) c: ee0301aa (orr x14 xzr x1) 10: 1e369cd2 (movz x30 #xe1b0) ; #&lt;code winder-dummy&gt; 14: bef8a0f2 (movk x30 (lsl #x7c5 #x10)) 18: 3e00c0f2 (movk x30 (lsl #x1 #x20)) 1c: 1e00e0f2 (movk x30 (lsl #x0 #x30)) 20: 8d0240f9 (ldr x13 (mem+ x20 (lsl #x0 #x3))) 24: bf011eeb (subs xzr x13 x30) 28: 810e0054 (b.ne (+ pc #x1d0)) ; =&gt; 1f8 2c: 7ece40f9 (ldr x30 (mem+ x19 (lsl #x33 #x3))) 30: 6dda40f9 (ldr x13 (mem+ x19 (lsl #x36 #x3))) 34: ccb343f8 (ldur x12 (mem+ x30 #x3b)) 38: 9f010deb (subs xzr x12 x13) 3c: 210c0054 (b.ne (+ pc #x184)) ; =&gt; 1c0 40: c00080d2 (movz x0 #x6) 44: 1f1800f1 (subs xzr x0 #x6) ; &lt;= 48: 81080054 (b.ne (+ pc #x110)) ; =&gt; 158 4c: 4db340f8 (ldur x13 (mem+ x26 #xb)) 50: b81e00d1 (sub x24 x21 #x7) 54: b5420091 (add x21 x21 #x10) 58: 7e5240f9 (ldr x30 (mem+ x19 (lsl #x14 #x3))) 5c: df0315eb (subs xzr x30 x21) 60: 03070054 (b.cc (+ pc #xe0)) ; =&gt; 140 64: f70318aa (orr x23 xzr x24) ; &lt;= 68: ed7200f8 (stur x13 (mem+ x23 #x7)) 6c: fe618ed2 (movz x30 #x730f) ; '#(3-unsaved-editor 3 18 33 7) 70: 9e40abf2 (movk x30 (lsl #x5a04 #x10)) 74: 3e00c0f2 (movk x30 (lsl #x1 #x20)) 78: 1e00e0f2 (movk x30 (lsl #x0 #x30)) 7c: fef200f8 (stur x30 (mem+ x23 #xf)) 80: 6dda40f9 (ldr x13 (mem+ x19 (lsl #x36 #x3))) ; &lt;= 84: b81e00d1 (sub x24 x21 #x7) 88: b5420091 (add x21 x21 #x10) 8c: 7e5240f9 (ldr x30 (mem+ x19 (lsl #x14 #x3))) 90: df0315eb (subs xzr x30 x21) 94: a3040054 (b.cc (+ pc #x94)) ; =&gt; 128 98: 177300f8 (stur x23 (mem+ x24 #x7)) ; &lt;= 9c: 0df300f8 (stur x13 (mem+ x24 #xf)) a0: 78da00f9 (str x24 (mem+ x19 (lsl #x36 #x3))) a4: fe010eaa (orr x30 x15 x14) a8: df0b40f2 (ands xzr x30 #x7) ac: a1000054 (b.ne (+ pc #x14)) ; =&gt; c0 b0: f7010eab (adds x23 x15 x14) b4: 66000054 (b.vs (+ pc #xc)) ; =&gt; c0 b8: 8a0240f8 (ldur x10 (mem+ x20)) bc: 40011fd6 (br x10) c0: d60600f1 (subs x22 x22 #x1) ; &lt;= c4: 41020054 (b.ne (+ pc #x48)) ; =&gt; 10c c8: 8f0600f9 (str x15 (mem+ x20 (lsl #x1 #x3))) cc: 8e0a00f9 (str x14 (mem+ x20 (lsl #x2 #x3))) d0: 94620091 (add x20 x20 #x18) d4: 7e010010 (adr x30 (+ pc #x2c)) ; =&gt; 100 d8: 9e0200f9 (str x30 (mem+ x20 (lsl #x0 #x3))) dc: 0a768cd2 (movz x10 #x63b0) ; #&lt;code event&gt; e0: 0af9a0f2 (movk x10 (lsl #x7c8 #x10)) e4: 2a00c0f2 (movk x10 (lsl #x1 #x20)) e8: 0a00e0f2 (movk x10 (lsl #x0 #x30)) ec: 40011fd6 (br x10) f0: 3101000000000000 (data) f8: 8d01000000000000 (data) 100: 946200d1 (sub x20 x20 #x18) ; &lt;= 104: 8f0640f9 (ldr x15 (mem+ x20 (lsl #x1 #x3))) 108: 8e0a40f9 (ldr x14 (mem+ x20 (lsl #x2 #x3))) 10c: e0030faa (orr x0 xzr x15) ; &lt;= 110: e1030eaa (orr x1 xzr x14) 114: 0afa94d2 (movz x10 #xa7d0) ; #&lt;code +&gt; 118: eaf8a0f2 (movk x10 (lsl #x7c7 #x10)) 11c: 2a00c0f2 (movk x10 (lsl #x1 #x20)) 120: 0a00e0f2 (movk x10 (lsl #x0 #x30)) 124: 40011fd6 (br x10) 128: 0a7899d2 (movz x10 #xcbc0) ; #&lt;code get-room&gt; ; &lt;= 12c: aaf8a0f2 (movk x10 (lsl #x7c5 #x10)) 130: 2a00c0f2 (movk x10 (lsl #x1 #x20)) 134: 0a00e0f2 (movk x10 (lsl #x0 #x30)) 138: 40013fd6 (blr x10) 13c: d7ffff17 (b (+ pc #x-a4)) ; =&gt; 98 140: 0a7899d2 (movz x10 #xcbc0) ; #&lt;code get-room&gt; ; &lt;= 144: aaf8a0f2 (movk x10 (lsl #x7c5 #x10)) 148: 2a00c0f2 (movk x10 (lsl #x1 #x20)) 14c: 0a00e0f2 (movk x10 (lsl #x0 #x30)) 150: 40013fd6 (blr x10) 154: c4ffff17 (b (+ pc #x-f0)) ; =&gt; 64 158: 41b340f8 (ldur x1 (mem+ x26 #xb)) ; &lt;= 15c: e2618ed2 (movz x2 #x730f) ; '#(3-unsaved-editor 3 18 33 7) 160: 8240abf2 (movk x2 (lsl #x5a04 #x10)) 164: 2200c0f2 (movk x2 (lsl #x1 #x20)) 168: 0200e0f2 (movk x2 (lsl #x0 #x30)) 16c: 8f0600f9 (str x15 (mem+ x20 (lsl #x1 #x3))) 170: 8e0a00f9 (str x14 (mem+ x20 (lsl #x2 #x3))) 174: 94620091 (add x20 x20 #x18) 178: 78ef93d2 (movz x24 #x9f7b) ; 'mark-frame-update 17c: 1815a1f2 (movk x24 (lsl #x8a8 #x10)) 180: 3800c0f2 (movk x24 (lsl #x1 #x20)) 184: 1800e0f2 (movk x24 (lsl #x0 #x30)) 188: 1a5340f8 (ldur x26 (mem+ x24 #x5)) 18c: 3e010010 (adr x30 (+ pc #x24)) ; =&gt; 1b0 190: 9e0200f9 (str x30 (mem+ x20 (lsl #x0 #x3))) 194: 770080d2 (movz x23 #x3) 198: 0ad340f8 (ldur x10 (mem+ x24 #xd)) 19c: 40011fd6 (br x10) 1a0: e101000000000000 (data) 1a8: 8f01000000000000 (data) 1b0: 946200d1 (sub x20 x20 #x18) ; &lt;= 1b4: 8f0640f9 (ldr x15 (mem+ x20 (lsl #x1 #x3))) 1b8: 8e0a40f9 (ldr x14 (mem+ x20 (lsl #x2 #x3))) 1bc: b1ffff17 (b (+ pc #x-13c)) ; =&gt; 80 1c0: deb343f8 (ldur x30 (mem+ x30 #x3b)) ; &lt;= 1c4: df1b00f1 (subs xzr x30 #x6) 1c8: 01010054 (b.ne (+ pc #x20)) ; =&gt; 1e8 1cc: c00080d2 (movz x0 #x6) 1d0: 0a389dd2 (movz x10 #xe9c0) ; #&lt;code reify-1cc&gt; 1d4: aaf8a0f2 (movk x10 (lsl #x7c5 #x10)) 1d8: 2a00c0f2 (movk x10 (lsl #x1 #x20)) 1dc: 0a00e0f2 (movk x10 (lsl #x0 #x30)) 1e0: 40013fd6 (blr x10) 1e4: 98ffff17 (b (+ pc #x-1a0)) ; =&gt; 44 1e8: bef140f8 (ldur x30 (mem+ x13 #xf)) ; &lt;= 1ec: 7eda00f9 (str x30 (mem+ x19 (lsl #x36 #x3))) 1f0: a07140f8 (ldur x0 (mem+ x13 #x7)) 1f4: 94ffff17 (b (+ pc #x-1b0)) ; =&gt; 44 1f8: c00080d2 (movz x0 #x6) ; &lt;= 1fc: 0a389dd2 (movz x10 #xe9c0) ; #&lt;code reify-1cc&gt; 200: aaf8a0f2 (movk x10 (lsl #x7c5 #x10)) 204: 2a00c0f2 (movk x10 (lsl #x1 #x20)) 208: 0a00e0f2 (movk x10 (lsl #x0 #x30)) 20c: 40013fd6 (blr x10) 210: 8dffff17 (b (+ pc #x-1cc)) ; =&gt; 44 214: 0a669bd2 (movz x10 #xdb30) ; #&lt;code doargerr&gt; ; &lt;= 218: aaf8a0f2 (movk x10 (lsl #x7c5 #x10)) 21c: 2a00c0f2 (movk x10 (lsl #x1 #x20)) 220: 0a00e0f2 (movk x10 (lsl #x0 #x30)) 224: 40011fd6 (br x10)


samth
2022-4-19 17:28:44

Is it possible you have errortrace on?


samth
2022-4-19 17:29:12

Can you try at the command line?


massung
2022-4-19 17:29:38

Ah, much better


massung
2022-4-19 17:29:49

0: ff0a00f1 (subs xzr x23 #x2) 4: 01040054 (b.ne (+ pc #x80)) ; =&gt; 84 8: 1e0001aa (orr x30 x0 x1) c: df0b40f2 (ands xzr x30 #x7) 10: a1000054 (b.ne (+ pc #x14)) ; =&gt; 24 14: 170001ab (adds x23 x0 x1) 18: 66000054 (b.vs (+ pc #xc)) ; =&gt; 24 1c: 8a0240f8 (ldur x10 (mem+ x20)) 20: 40011fd6 (br x10) 24: d60600f1 (subs x22 x22 #x1) ; &lt;= 28: 41020054 (b.ne (+ pc #x48)) ; =&gt; 70 2c: 800600f9 (str x0 (mem+ x20 (lsl #x1 #x3))) 30: 810a00f9 (str x1 (mem+ x20 (lsl #x2 #x3))) 34: 94620091 (add x20 x20 #x18) 38: 7e010010 (adr x30 (+ pc #x2c)) ; =&gt; 64 3c: 9e0200f9 (str x30 (mem+ x20 (lsl #x0 #x3))) 40: 0a769cd2 (movz x10 #xe3b0) ; #&lt;code event&gt; 44: 4abaa0f2 (movk x10 (lsl #x5d2 #x10)) 48: 2a00c0f2 (movk x10 (lsl #x1 #x20)) 4c: 0a00e0f2 (movk x10 (lsl #x0 #x30)) 50: 40011fd6 (br x10) 54: 9500000000000000 (data) 5c: 8d01000000000000 (data) 64: 946200d1 (sub x20 x20 #x18) ; &lt;= 68: 800640f9 (ldr x0 (mem+ x20 (lsl #x1 #x3))) 6c: 810a40f9 (ldr x1 (mem+ x20 (lsl #x2 #x3))) 70: 0afa84d2 (movz x10 #x27d0) ; #&lt;code +&gt; ; &lt;= 74: 4abaa0f2 (movk x10 (lsl #x5d2 #x10)) 78: 2a00c0f2 (movk x10 (lsl #x1 #x20)) 7c: 0a00e0f2 (movk x10 (lsl #x0 #x30)) 80: 40011fd6 (br x10) 84: 0a668bd2 (movz x10 #x5b30) ; #&lt;code doargerr&gt; ; &lt;= 88: 0abaa0f2 (movk x10 (lsl #x5d0 #x10)) 8c: 2a00c0f2 (movk x10 (lsl #x1 #x20)) 90: 0a00e0f2 (movk x10 (lsl #x0 #x30)) 94: 40011fd6 (br x10)


samth
2022-4-19 17:30:43

Yes, that looks like what I see on x64 — it basically inlines fixnum addition, and also has tests for various things to call the regular + or handle errors


massung
2022-4-19 17:30:51

+1


samth
2022-4-19 17:34:16

Here’s the disassemble of (define (add m n) (unsafe-fx+ m n)): 0: 4883fd02 (cmp rbp #x2) 4: 7508 (jnz (+ rip #x8)) ; =&gt; e 6: 4a8d2c07 (lea rbp (mem+ rdi (* r8 #x1))) a: 41ff6500 (jmp (mem64+ r13 #x0)) e: e9fde473fc (jmp (+ rip #x-38c1b03)) ; #&lt;code doargerr&gt; ; &lt;= 13: 0f1f8000000000 (data)


samth
2022-4-19 17:38:08

Although I don’t really understand what’s going on there


samth
2022-4-19 17:40:52

Also note that you will get different results if you put the code in a module.


mflatt
2022-4-19 17:44:53

0: 4883fd02 (cmp rbp #x2) ; are there two arguments? 4: 7508 (jnz (+ rip #x8)) ; if not, jump to call doargerr 6: 4a8d2c07 (lea rbp (mem+ rdi (* r8 #x1))) ; add arguments a: 41ff6500 (jmp (mem64+ r13 #x0)) ; jump to return address e: e9fde473fc (jmp (+ rip #x-38c1b03)) ; call doargerr 13: 0f1f8000000000 (data) ; info for GC and stack unwinding


samth
2022-4-19 17:46:51

Thanks!


samth
2022-4-19 17:47:13

it was the mapping between “add arguments” and that lea that I was struggling with


mflatt
2022-4-19 17:47:54

Safe and generic version: 0: ff0a00f1 (subs xzr x23 #x2) ; are there two arguments? 4: 01040054 (b.ne (+ pc #x80)) ; =&gt; 84 ; jump to slow path if not 8: 1e0001aa (orr x30 x0 x1) ; "or" to combine mask bits c: df0b40f2 (ands xzr x30 #x7) ; zero low bits =&gt; both are fixnums 10: a1000054 (b.ne (+ pc #x14)) ; =&gt; 24 ; jump to slow path if not 14: 170001ab (adds x23 x0 x1) ; add 18: 66000054 (b.vs (+ pc #xc)) ; =&gt; 24 ; jump to slow path on overflow 1c: 8a0240f8 (ldur x10 (mem+ x20)) ; load return address 20: 40011fd6 (br x10) ; return 24: d60600f1 (subs x22 x22 #x1) ; &lt;= ; decr. engine counter 28: 41020054 (b.ne (+ pc #x48)) ; =&gt; 70 ; continue if not checking events 2c: 800600f9 (str x0 (mem+ x20 (lsl #x1 #x3))) ; start saving context... 30: 810a00f9 (str x1 (mem+ x20 (lsl #x2 #x3))) 34: 94620091 (add x20 x20 #x18) 38: 7e010010 (adr x30 (+ pc #x2c)) ; =&gt; 64 3c: 9e0200f9 (str x30 (mem+ x20 (lsl #x0 #x3))) 40: 0a7684d2 (movz x10 #x23b0) ; load address of event-checking code 44: ca5fa0f2 (movk x10 (lsl #x2fe #x10)) 48: 2a00c0f2 (movk x10 (lsl #x1 #x20)) 4c: 0a00e0f2 (movk x10 (lsl #x0 #x30)) 50: 40011fd6 (br x10) ; jump to event-checking code 54: 9500000000000000 (data) ; data for GC and stack unwind 5c: 8d01000000000000 (data) 64: 946200d1 (sub x20 x20 #x18) ; &lt;= ; return from event-checking code 68: 800640f9 (ldr x0 (mem+ x20 (lsl #x1 #x3))) 6c: 810a40f9 (ldr x1 (mem+ x20 (lsl #x2 #x3))) 70: 0afa8cd2 (movz x10 #x67d0) ; #&lt;code +&gt; ; load address of general `+` 74: aa5fa0f2 (movk x10 (lsl #x2fd #x10)) 78: 2a00c0f2 (movk x10 (lsl #x1 #x20)) 7c: 0a00e0f2 (movk x10 (lsl #x0 #x30)) 80: 40011fd6 (br x10) ; jump to generate `+` 84: 0a6693d2 (movz x10 #x9b30) ; load address of bad-arity function 88: 6a5fa0f2 (movk x10 (lsl #x2fb #x10)) 8c: 2a00c0f2 (movk x10 (lsl #x1 #x20)) 90: 0a00e0f2 (movk x10 (lsl #x0 #x30)) 94: 40011fd6 (br x10) ; jump to bad arity


mflatt
2022-4-19 17:49:20

The block there from 24–50 is often a more compact event-detour pattern, but this longer form tends to be used if the function starts with a short way out.


samth
2022-4-19 18:08:43

samth
2022-4-19 18:08:54

ben.knoble
2022-4-19 19:25:35

Shall we do another export to update the archive, too?


samth
2022-4-19 19:32:08

yes


ben.knoble
2022-4-19 19:36:31

If helpful, the last update was 5th November 2021


samth
2022-4-19 19:36:46

I sent you the new zip file