gknauth
2020-11-13 13:26:32

I confirm that Racket 7.9.0.5 BC fixes the problem on Win10@work.


gknauth
2020-11-13 14:11:33

I confirm that Racket 7.9.0.5 CS fixes the problem on Win10@work.


gknauth
2020-11-13 14:14:09

@greg It can be a :smile: since the universe is robust enough not to crash in the presence of such Windows weirdness, and now Racket’s robustness is closer to the universe’s. (I’m saying this knowing that if the universe does eventually crash, it won’t be because of Windows and I won’t be around.)


sorawee
2020-11-13 14:16:28

@mflatt how can I check if the scopes for a is a subset of scopes for b? free-identifier=? is symmetric, so it wouldn’t work.

syntax-debug-info looks close to what I want, but “debug” in its name suggests that I probably shouldn’t use it?


laurent.orseau
2020-11-13 14:19:05

Question about weak boxes: #lang racket (define boxes '()) (define a (make-weak-box 'A)) (set! boxes (cons a boxes)) (map weak-box-value boxes) (let () (define b (make-weak-box 'B)) (set! boxes (cons b boxes)) (println (map weak-box-value boxes)) (void)) (map weak-box-value boxes) outputs (A) (B A) (B A) Why isn’t the last output (#f A) instead?


mflatt
2020-11-13 14:21:28

This may be one of those “what do you really want do to?” questions… (Does checking that a has a subset of scopes of b by itself mean anything about binding? I don’t think so, but it feels like i’m slow to page this back in today.)


laurent.orseau
2020-11-13 14:22:44

or do I need to define something like a weak-list?


mflatt
2020-11-13 14:24:46

Even if you add (collect-garbage) both 'A and 'B are part of your module’s overall code, so they’re still reachable.


sorawee
2020-11-13 14:29:13

I want to detect that when I rename def to abc, no accidental capture will occur.

(let ([abc 1]) ; 0 abc ; 1 (let ([def 2]) ; 2 def)) ; 3 IIUC, because 0 is a subset of 3 (which causes free-identifier=? to return #t), we might incorrectly conclude that there will be an accidental capture. However, this is not the case because 0 is a subset of 2 as well.


laurent.orseau
2020-11-13 14:33:32

But if I split into 2 modules: “weak-list.rkt” #lang racket (provide add-value get-values) (define boxes '()) (define (add-value v) (set! boxes (cons (make-weak-box v) boxes))) (define (get-values) (map weak-box-value boxes)) “main.rkt”: #lang racket (require "weak-list.rkt") (add-value 'A) (get-values) (let () (add-value 'B) (println (get-values)) (void)) (collect-garbage) (get-values) I get the same output, even though ’B is not reachable anymore, or is it?


mflatt
2020-11-13 14:36:26

Yes, the code is still reachable. Unless you change the namespace, all modules loaded into the namespace are still reachable via the namespace, and that includes the code that implements each module’s body.


mflatt
2020-11-13 14:40:20

Are you in a setting where you know all of the binding forms? For example, you know what let does?


laurent.orseau
2020-11-13 14:40:51

Wait, are you saying that in the following: #lang racket (struct A (a) #:transparent) (let () (define b (A 'b)) (println b)) (collect-garbage) (displayln "end") the GC cannot claim the struct instance on line (collect-garbage)?


sorawee
2020-11-13 14:40:56

Yeah, this is fully expanded code.


mflatt
2020-11-13 14:41:00

I ask because it seems like uncooperative macros can defeat any kind of “what if” question like this.


mflatt
2020-11-13 14:41:19

But you only want to rename in fully expanded code to still have fully expanded code?


mflatt
2020-11-13 14:43:34

No, the struct instance is not part of the code; it exists only when you run. But a literal symbol is part of the code.


mflatt
2020-11-13 14:43:54

(A quoted prefab instance could be part of some code.)


sorawee
2020-11-13 14:46:24

yes, I believe so.


mflatt
2020-11-13 14:51:26

I guess if you take the name at 0 and the scopes at 2, then check whether the resulting identifier is free-identifier=? but not bound-identifier=? to the identifier at 0, then the scopes at 0 must be a proper subset of the scopes at 2. But there’s not current an exposed operation to checks the scope sets more directly, as far as I can remember.


laurent.orseau
2020-11-13 14:56:34

Aha. So what’s readable is always accessible within the same namespace, is that it?


mflatt
2020-11-13 15:01:36

Essentially, yes. Well, not just readable, but actually read as part of the compiled code. But this gets complicated, because compiler optimizations get involved. For example, the compiler optimizes to 'hello in this case, so the symbol is retained: #lang racket (define b (make-weak-box (string->symbol "hello"))) (collect-garbage) (weak-box-value b) In this case, the optimizer is thwarted: #lang racket (define convert string->symbol) (set! convert convert) (define b (make-weak-box (convert "hello"))) (collect-garbage) (weak-box-value b) Weak references and finalization expose all sorts of compilation details that we’d really prefer to not specify.


laurent.orseau
2020-11-13 15:16:27

oh, I see, thanks for the detailed explanation. One last thing: If my structs are not prefabs, does a call to (collect-garbage) ensure that all that has become unreachable is collected?


mflatt
2020-11-13 15:55:27

Yes.


mflatt
2020-11-13 15:58:28

(I hesitate to simply say “yes”, because it’s always complicated. For example, an otherwise unreachable object may have a finalizer, in which case it’s not collected yet. Also, setting side finalizers, garbage collection reclaims unreachable values by definition. But I think “yes” is probably right for the situation you have in mind.)


laurent.orseau
2020-11-13 16:02:21

Thank you!


alexharsanyi
2020-11-14 03:13:47

I am getting extremely bad performance when I use 64 bit integers as hash keys in hash maps

In the GIST below, when using ~130000 keys, it takes 40 seconds to add the keys (hash-set) and 111 seconds to retrieve them (hash-ref!). In comparison, if I convert the same 64 bit integers to strings, it takes about 31 milliseconds for the same operations.

Could someone have a look at the GIST below (there is also a data file that needs to be downloaded), and confirm that they see the same behavior?

https://gist.github.com/alex-hhh/87b880b42e3c39eb63948b05ada2d37c

This happens on both 7.8 BC and 7.9 CS, I don’t think it has anything to do with BC vs CS.


mflatt
2020-11-14 03:48:09

I see the same thing. Using an immutable hash table turns out to be much faster, though.

In both the CS and BC implementations, the hashing function on fixnum is the identity function, and mutable hash tables uses a power-of-two array of buckets. Since the numbers mostly have the same lower and upper bits, I think they all end up mapping to the same small region of the array (i.e., lots of collisions). The identity function is a bad hash function for this set of data.

Immutable hash tables use a HAMT strategy that adapts well to numbers that are close together. Essentially, it finds the range of bits where the numbers differ and spreads them out there.


mflatt
2020-11-14 03:54:12

(Well, the numbers aren’t fixnums, but they’re almost fixnums. Both the CS and BC hash functions end up being identity-like for bignums that are barely bignums.)


alexharsanyi
2020-11-14 04:28:40

Thanks for looking into it. The numbers are not random, but this is the data that I need to work with. You are correct that both the upper and lower bits will be ~mostly~ identical for the datasets I have to work with.


mflatt
2020-11-14 04:38:53

Maybe the hash function should xor the high half of the bits with the low half; that doesn’t lose bits overall, since xoring again would recover the original number, but it allows the high bits of the number to influence the low bits of the hash. And again for the fourth of the bits in the high part of the low half — and probably the same for the high eighth of the low fourth for the 64-bit machines. If that sounds right, I can try it tomorrow.


alexharsanyi
2020-11-14 04:44:54

The 64 bit values that I use have some structure into them, and I am thinking to write a custom hash type for them, as I can easily determine the number of lower bits that would be identical and can be dropped (higher bits is somewhat more difficult). However, i tried using immutable hashes with them, and they are significantly faster than converting the number to a string and using a mutable hash.

Also, these numbers are geoids (https://pkgs.racket-lang.org/package/geoid) and they represent geographical locations which are close to each other, and this is why the numbers are close together.