soegaard2
2021-12-27 10:11:41

The difference between eq? and eqv? ought to be small.

Computing the hash for an element in a hasheq set is done using the adresss of the element only.

Computing the hash for a hash(equal) set is more involved. Two lists xs and ys with equal values needs to get the same hash value. And the only way to ensure that is to compute hash values for the individual elements and combine them.

That is computing a hash for eq takes time O(1) whereas computing a hash for equal takes time O(n) where n is the size of the data structure.


badkins
2021-12-27 15:46:59

Switching from eqv to eq dropped the time from 4.1 to 1.4 seconds, so it doesn’t seem small to me :)


soegaard2
2021-12-27 16:23:59

That’s indeed surprising.


sorawee
2021-12-27 16:28:15

From Discord


sorawee
2021-12-27 16:30:14

@badkins can you run my version (https://racket.discourse.group/t/racket-set-performance/479/8) which does double hashing, and see how long it takes compared to your latest version?


badkins
2021-12-27 16:32:23

Where is the code?


sorawee
2021-12-27 16:32:48

Well, your initial version, with the code that I wrote


sorawee
2021-12-27 16:33:15

I overrode the meaning of mutable-set, in-set, etc.


badkins
2021-12-27 16:33:23

Oh, that’s old news now. <https://github.com/lojic/LearningRacket/blob/master/advent-of-code–2021/solutions/day25/day25-set.rkt|This version> is the fast one now.


sorawee
2021-12-27 16:34:32

Yes, I understand that your latest version is fast. I want to know how fast is that version compared to yours on your machine


sorawee
2021-12-27 16:34:43

Because it also uses hasheq


badkins
2021-12-27 16:35:38

Gotcha. I’d have to find the correct version in the git history, and to be frank, I’ve reached my cap on time for this :)


badkins
2021-12-27 16:37:56

I did make a note to myself to revisit this later though. I’d like to see if I can help improve set / hash performance. It seems like it could get closer to Python, unless there’s just a fundamental limit with Chez Scheme that can’t match the C code that Python is using.


badkins
2021-12-27 16:39:01

Specifically with eqv? since eq? seems ok.


badkins
2021-12-27 16:40:21

Hmm… if Python tuples are immutable, maybe the hashing for them is more similar to a eq? ?


badkins
2021-12-27 16:41:10

Scratch that, I didn’t think it through - they could still be in different locations.


sorawee
2021-12-27 16:41:12

I’ve heard Python’s tuple is slower than list lol


sw5355700
2021-12-27 19:12:17

The expectation is that there would be a performance difference between the different hashing techniques and the difference in runtime performance wasn’t that significant. I don’t think its as surprising as it seems.


spdegabrielle
2021-12-27 19:43:12

I’ve seen is suggested before that docs should include performance characteristics of different operations. Is it worth collating the above and making a wiki page or a PR?


renatiko.04
2021-12-28 02:21:09

@renatiko.04 has joined the channel