
The difference between eq? and eqv? ought to be small.
Computing the hash for an element in a hasheq set is done using the adresss of the element only.
Computing the hash for a hash(equal) set is more involved. Two lists xs and ys with equal values needs to get the same hash value. And the only way to ensure that is to compute hash values for the individual elements and combine them.
That is computing a hash for eq takes time O(1) whereas computing a hash for equal takes time O(n) where n is the size of the data structure.

Switching from eqv
to eq
dropped the time from 4.1 to 1.4 seconds, so it doesn’t seem small to me :)

That’s indeed surprising.

From Discord

@badkins can you run my version (https://racket.discourse.group/t/racket-set-performance/479/8) which does double hashing, and see how long it takes compared to your latest version?

Where is the code?

Well, your initial version, with the code that I wrote

I overrode the meaning of mutable-set
, in-set
, etc.

Oh, that’s old news now. <https://github.com/lojic/LearningRacket/blob/master/advent-of-code–2021/solutions/day25/day25-set.rkt|This version> is the fast one now.

Yes, I understand that your latest version is fast. I want to know how fast is that version compared to yours on your machine

Because it also uses hasheq

Gotcha. I’d have to find the correct version in the git history, and to be frank, I’ve reached my cap on time for this :)

I did make a note to myself to revisit this later though. I’d like to see if I can help improve set / hash performance. It seems like it could get closer to Python, unless there’s just a fundamental limit with Chez Scheme that can’t match the C code that Python is using.

Specifically with eqv?
since eq?
seems ok.

Hmm… if Python tuples are immutable, maybe the hashing for them is more similar to a eq?
?

Scratch that, I didn’t think it through - they could still be in different locations.

I’ve heard Python’s tuple is slower than list
lol

The expectation is that there would be a performance difference between the different hashing techniques and the difference in runtime performance wasn’t that significant. I don’t think its as surprising as it seems.

I’ve seen is suggested before that docs should include performance characteristics of different operations. Is it worth collating the above and making a wiki page or a PR?

@renatiko.04 has joined the channel