laurent.orseau
2021-10-14 12:57:32

Argh, I somehow feel compelled to say that you can sort first and then do a linear pass to remove the duplicates. (But remove-duplicates has a fast implementation for eq? anyway iirc)

Yes, you’ve just learned nothing useful. You’re welcome.


badkins
2021-10-14 14:45:26

Looks like sort followed by linear pass is twice as fast as sort followed by remove-duplicates.


badkins
2021-10-14 14:47:06

(define (sort-and-dedup-1 lst) (remove-duplicates (sort lst symbol<?) eq?)) (define (sort-and-dedup-4 lst) (define (dedup lst prev) (if (null? lst) '() (let ([ s (car lst) ]) (if (eq? s prev) (dedup (cdr lst) prev) (cons s (dedup (cdr lst) s)))))) (dedup (sort lst symbol<?) #f))


badkins
2021-10-14 14:48:23

2 used a let loop, but it was slower than #4 (learned that trick from Matthew Flatt !), and #3 was let loop w/o the reverse which was the quickest, but in reverse order.


badkins
2021-10-14 14:48:50

% racket dedup.rkt cpu time: 1825 real time: 1841 gc time: 772 cpu time: 925 real time: 940 gc time: 116 cpu time: 877 real time: 892 gc time: 38 cpu time: 892 real time: 907 gc time: 40


badkins
2021-10-14 14:51:06

The slowest part, by far, was creating the list of a million symbols :) But I used a random-string function I had lying around that uses crypto-random-bytes (require file/sha1 racket/list racket/random) (define (random-string n) (let* ([ half-n (ceiling (/ n 2)) ] [ random-bytes (crypto-random-bytes half-n) ] [ str (bytes->hex-string random-bytes) ]) (if (even? n) str (substring str 0 n)))) (define (random-symbols n) (let loop ([ n n ][ result '() ]) (if (< n 1) result (loop (sub1 n) (cons (string->symbol (random-string 10)) result)))))


badkins
2021-10-14 14:55:55

Hmm… I just realized my symbols are too long. With 10 random characters, there probably aren’t any duplicates :(


badkins
2021-10-14 15:00:27

<sigh> with 4 million 3 character symbols, it’s a wash - I guess you can just ignore everything I said above :) % racket dedup.rkt cpu time: 1728 real time: 1776 gc time: 113 cpu time: 1778 real time: 1824 gc time: 231 cpu time: 1764 real time: 1810 gc time: 163 cpu time: 1725 real time: 1771 gc time: 117


badkins
2021-10-14 15:02:23

Now I’m curious about the code for remove-duplicates - I suspect it’s using a hash, so the more dupes, the faster it is.


jestarray
2021-10-14 17:50:36

https://discord.gg/6Zq8sH5 , someone should post the discord racket link in the header of #general ?


sorawee
2021-10-14 17:53:47

> 3 was let loop w/o the reverse  which was the quickest, but in reverse order. Well, you could have written symbol&gt;? and sorted them using this comparator. The let loop reversing would put the list back in the ascending order.


sorawee
2021-10-14 17:56:24

Will Discord do the same for Slack? :stuck_out_tongue: This is what’s called quid pro quo, right? lol


samdphillips
2021-10-14 18:01:55

Or at least pin this message.


samdphillips
2021-10-14 18:03:24

Seems the Slack link is not in the resources channel on Discord. I’ll pin it there.


laurent.orseau
2021-10-14 18:10:00

What about remove-duplicates first, then sort?


badkins
2021-10-14 18:27:58

@sorawee I didn’t find symbol&gt;? when I looked. Apparently there is one in #lang mischief though.


sorawee
2021-10-14 18:28:48

you can create it yourself using symbol&lt;?


badkins
2021-10-14 18:31:32

Sure, but the timings didn’t seem to warrant it.


badkins
2021-10-14 18:32:38

Good catch @laurent.orseau % racket dedup.rkt cpu time: 1698 real time: 1745 gc time: 145 cpu time: 1740 real time: 1787 gc time: 237 cpu time: 1626 real time: 1673 gc time: 127 cpu time: 1711 real time: 1757 gc time: 213 cpu time: 78 real time: 80 gc time: 0


badkins
2021-10-14 18:32:57

That’s with 3 character symbols, so lots of dupes.


badkins
2021-10-14 18:33:25

Last one is: (define (sort-and-dedup-5 lst) (sort (remove-duplicates lst eq?) symbol&lt;?))


badkins
2021-10-14 18:37:00

With 1M 5 char symbols (vs. 4M 3 char): % racket dedup.rkt cpu time: 1225 real time: 1241 gc time: 233 cpu time: 968 real time: 983 gc time: 85 cpu time: 1275 real time: 1290 gc time: 396 cpu time: 947 real time: 961 gc time: 47 cpu time: 893 real time: 902 gc time: 173 https://gist.github.com/lojic/0a096547ec502facd6f5920cdcb00124


jestarray
2021-10-14 18:43:19

jestarray
2021-10-14 18:43:42

i didnt know users could have permissions to set the header lmaoo


spdegabrielle
2021-10-14 18:55:06

I always forget about pasterack! Pasterack is is awesome!


notjack
2021-10-14 18:55:36

spdegabrielle
2021-10-14 18:56:33

Is that in 1 hours time from now?


samdphillips
2021-10-14 19:00:03

Yes


notjack
2021-10-14 20:46:15

Meeting summary: we talked about the State of Rhombus document and agreed that it needs a few more concrete details about the next steps, especially in regards to our plan for Rhombus libraries. More information there would make it easier for people to find sections of the Rhombus project they can contribute to or take ownership of. I plan to add this information to the document and then we’ll review it again at the next meeting on October 28th.


jeremiah.meert
2021-10-14 21:14:58

@jeremiah.meert has joined the channel