Racket Slack Archive

luis.osa.gdc

2019-10-25 13:34:54

Nice people of the Racket community! I do side projects from time to time in your language. The last one is an implementation of regression trees (a variation of decision trees, an ML algorithm). But I found that my training gets stuck in even the simplest Kaggle problem, and I do not understand why. If someone cares to give me advice, the code is here: https://github.com/logc/house-prices (the README has a longer explanation of what I attempted and how the code is organized).

samth

2019-10-25 13:40:43

@luis.osa.gdc have you tried running the profiler?

soegaard2

2019-10-25 13:42:06

From vectors.rkt: (define (var v) (define (actual-var v) (let ([n (vector-length v)]) (* (/ 1 (sqr n)) (for/sum ([i (in-range n)]) (for/sum ([j (in-range i n)]) (sqr (- (vector-ref v i) (vector-ref v j)))))))) (cond [(vector-empty? v) 0] [else (actual-var v)])) This makes the computation take O(n^2) time. It’s faster to use E[X^2]-E[X]^2 which takes O(n) to compute.

luis.osa.gdc

2019-10-25 13:42:51

no, is it only on DrRacket? or is there a way to run it from command line?

samth

2019-10-25 13:43:11

yes, see the profile collection

luis.osa.gdc

2019-10-25 13:43:17

damn! :face_palm:

samth

2019-10-25 13:43:36

https://docs.racket-lang.org/profile/index.html

luis.osa.gdc

2019-10-25 13:43:45

I am going to try that out, and let you know

luis.osa.gdc

2019-10-25 13:44:44

thanks! I knew I was missing some of the tooling which I actually knew existed …

soegaard2

2019-10-25 13:45:24

A style thing only, in io.rkt you have: (define (maybe-string->number a-string) (let ([maybe-number (string->number a-string)]) (if maybe-number maybe-number ; else a-string))) You can write (or (string->number a-string) a-string)

soegaard2

2019-10-25 13:48:39

In a few places you have: (for ([v vvs]) ... at some point (I still think it is the case) it was faster to use: (for ([v (in-vector vvs)]) ... but whether it matters depend on the context.

soegaard2

2019-10-25 13:51:16

Btw variance is in math/statistics.

samth

2019-10-25 13:53:29

yes, it’s definitely much faster to use in-vector

samth

2019-10-25 13:54:29

&gt; (define v (make-vector 1000000 1.0))
&gt; (time (for/sum ([i v]) i))
cpu time: 40 real time: 40 gc time: 0
1000000.0
&gt; (time (for/sum ([i (in-vector v)]) i))
cpu time: 16 real time: 16 gc time: 0
1000000.0

luis.osa.gdc

2019-10-25 13:55:50

ah, didn’t know either about variance in the standard lib, nor about in-vector being faster … good to learn!

luis.osa.gdc

2019-10-25 13:59:23

@samth I already have a profile on the program, done with raco profile, but I have a bit of a hard time locating where the bottleneck is … probably it’s in the variance function, as pointed out by @soegaard2, but I don’t understand really the references of the profile to the source code

samth

2019-10-25 13:59:37

can you post the output?

luis.osa.gdc

2019-10-25 14:02:44

Here is the output of profile on my main.rkt

samth

2019-10-25 14:04:32

@luis.osa.gdc can you run raco make -v main.rkt and then re-run the test?

luis.osa.gdc

2019-10-25 14:06:42

here are both outputs …

luis.osa.gdc

2019-10-25 14:07:34

This is run on macOS 10.14 and Racket 7.4, btw

luis.osa.gdc

2019-10-25 14:08:49

I hope I am not spamming other people … I have tried to post outputs as snippets and not as literal code blocks, since this is the general channel

luis.osa.gdc

2019-10-25 14:12:17

and, for completeness, this run is made on a sample of 100 lines from the total dataset; that is why it actually finishes training

samth

2019-10-25 14:12:51

ah, raco profile isn’t running the main submodule

samth

2019-10-25 14:13:56

how long should running main.rkt take?

luis.osa.gdc

2019-10-25 14:16:29

without profiling it just takes 1.11 seconds for the sample; I don’t really have a requirement for how long should it take, but not incredibly long …

samth

2019-10-25 14:17:14

for me it takes forever

samth

2019-10-25 14:17:27

is there something i should change in main.rkt?

luis.osa.gdc

2019-10-25 14:18:42

I did take a sample of 100 lines, like this: head -n 100 data/train.csv > data/train.sample.csv and then wrote that file name in main.rkt: (define S (io:parse-file "data/train.sample.csv"))

samth

2019-10-25 14:20:20

ok, I changed your code to use variance from math/statistics and to use in-vector and that now takes 1 ms

samth

2019-10-25 14:20:54

but 200 is still very slow

samth

2019-10-25 14:21:29

should i change the number in main.rkt to 200?

samth

2019-10-25 14:21:49

changing that number to 200 makes it finish instantly

samth

2019-10-25 14:22:23

same with 1000 lines and 1000 in main.rkt

luis.osa.gdc

2019-10-25 14:22:52

ah, in fact I introduced that depth argument in order to experiment around — the training was taking forever

luis.osa.gdc

2019-10-25 14:25:16

that looks solved for me! if you could still tell me a bit how to interpret the profiler’s output, I would be thankful, @samth

samth

2019-10-25 14:25:44

the profiler output is just showing you the results of loading the module, I think

luis.osa.gdc

2019-10-25 14:25:53

I need to leave now, but you can maybe comment on the snippets above, or something …

samth

2019-10-25 14:26:07

so it’s just a bunch of things internal to racket module loading

samth

2019-10-25 14:26:13

which is why it wasn’t useful

luis.osa.gdc

2019-10-25 14:26:16

it looks quite cryptic …

luis.osa.gdc

2019-10-25 14:26:26

thanks for any explanation! I will read that

luis.osa.gdc

2019-10-25 14:26:54

and thanks to both for your insights! I learned a lot from this

krismicinski

2019-10-25 14:28:42

don’t feel bad, I had also forgotten about it and the reminder was helpful!

samth

2019-10-25 14:29:05

I can’t reproduce anything being slow with a 100-line file at all

gregor.kiczales

2019-10-25 16:45:35

I’m wondering if someone could look at https://github.com/racket/racket/issues/2758 this causes us real problems because we have to run our handin and autograder single threaded

samth

2019-10-25 16:47:37

@gregor.kiczales do you have code that demonstrates this problem (even if complicated)?

gregor.kiczales

2019-10-25 16:48:19

yes, but it’s a threading problem, so it only comes up under load.

samth

2019-10-25 16:49:11

sure, I’ll try to load-test it

gregor.kiczales

2019-10-25 16:51:41

ok, give me a bit to bundle it

leif

2019-10-25 18:03:35

Hey @mflatt Using syntax-binding-set doesn’t seem to work when the module your trying to ‘forge’ is the one your expanding:

leif

2019-10-25 18:03:40

#lang racket

(require (for-syntax syntax/location))

(define-for-syntax (forge-identifier modpath sym)
  (syntax-binding-set-&gt;syntax
   (syntax-binding-set-extend (syntax-binding-set) sym 0 modpath)
   sym))

(define x 42)

(define-syntax (m stx)
  (forge-identifier
   (module-path-index-join (quote-module-path) #f)
   'x))

(m)

leif

2019-10-25 18:04:19

I suspect this is because the macro expander is using eq? somewhere internally.

leif

2019-10-25 18:04:26

Do you have any ideas how to get around thisS?

mflatt

2019-10-25 18:11:17

While a module is being expanded, the module doesn’t really have a name, so (quote-module-path) doesn’t make sense. Can you use (variable-reference->module-path-index (#%variable-reference)) instead of (module-path-index-join (quote-module-path) #f)?

leif

2019-10-25 18:19:41

Okay, that worked, although it does require the library to be able to determine if its expanding a ‘self’ module rather than a different one.

leif

2019-10-25 19:44:53

Okay @mflatt One more question: Is there any security reason that the serialize and fasl systems can’t make paths relative if they go up a directory?

leif

2019-10-25 19:45:54

Like, if I have a module in /foo/bar/mod.rkt and I have another one in /foo/baz/mod2.rkt. I would like to describe the second one relative to the first one as ../mod2.rkt.

mflatt

2019-10-25 19:47:27

That’s the intent of the pair mode of relative-to: a path to make things relative to, and a second path to bound the region that relative paths can reach (i.e., the first path can be an extension of the second one).

leif

2019-10-25 19:48:58

facepalms.

leif

2019-10-25 19:49:01

oh right, thanks.

leif

2019-10-25 20:03:44

Hmm…although it doesn’t looks like it collapses the path first. Would that be a bug?

leif

2019-10-25 20:04:08

Like say:

#lang scratch

(struct foo (x y z)
  #:mutable
  #:property prop:serializable
  (make-serialize-info
   (λ (this)
     (vector (foo-x this) (foo-y this) (foo-z this)))
   #'foo-beam
   #t
   (or (current-load-relative-directory) (current-directory))))

(define foo-beam
  (make-deserialize-info
   (λ (x y z)
     (foo x y z))
   (λ ()
     (define f (foo #f #f #f))
     (vector f
             (λ (x y z)
               (set-foo-x! f x)
               (set-foo-y! f y)
               (set-foo-z! f z))))))


(serialize (foo 1 2 3)
           #:relative-directory (cons (build-path "/" "home" "leif" "." "test" "de.rkt")
                                      (build-path "/")))

leif

2019-10-25 20:04:23

But if I remove the "." at the end, it seems to work as expected.

mflatt

2019-10-26 01:16:20

The relative-path calculation is not currently meant to automatically collapse the path (and I doubt that would be a good idea, although I’m not 100% sure). You should collapse explicitly if that makes sense in your use.

willbanders

2019-10-26 02:28:09

Is there a way to identity types which are discrete? I’m trying to make a system for ranges which can convert something like an inclusive range [0, 9] to an inclusive-exclusive range [0, 10). Is there any feasible way to do this?