luis.osa.gdc
2019-10-25 13:34:54

Nice people of the Racket community! I do side projects from time to time in your language. The last one is an implementation of regression trees (a variation of decision trees, an ML algorithm). But I found that my training gets stuck in even the simplest Kaggle problem, and I do not understand why. If someone cares to give me advice, the code is here: https://github.com/logc/house-prices (the README has a longer explanation of what I attempted and how the code is organized).


samth
2019-10-25 13:40:43

@luis.osa.gdc have you tried running the profiler?


soegaard2
2019-10-25 13:42:06

From vectors.rkt: (define (var v) (define (actual-var v) (let ([n (vector-length v)]) (* (/ 1 (sqr n)) (for/sum ([i (in-range n)]) (for/sum ([j (in-range i n)]) (sqr (- (vector-ref v i) (vector-ref v j)))))))) (cond [(vector-empty? v) 0] [else (actual-var v)])) This makes the computation take O(n^2) time. It’s faster to use E[X^2]-E[X]^2 which takes O(n) to compute.


luis.osa.gdc
2019-10-25 13:42:51

no, is it only on DrRacket? or is there a way to run it from command line?


samth
2019-10-25 13:43:11

yes, see the profile collection


luis.osa.gdc
2019-10-25 13:43:17

damn! :face_palm:



luis.osa.gdc
2019-10-25 13:43:45

I am going to try that out, and let you know


luis.osa.gdc
2019-10-25 13:44:44

thanks! I knew I was missing some of the tooling which I actually knew existed …


soegaard2
2019-10-25 13:45:24

A style thing only, in io.rkt you have: (define (maybe-string->number a-string) (let ([maybe-number (string->number a-string)]) (if maybe-number maybe-number ; else a-string))) You can write (or (string->number a-string) a-string)


soegaard2
2019-10-25 13:48:39

In a few places you have: (for ([v vvs]) ... at some point (I still think it is the case) it was faster to use: (for ([v (in-vector vvs)]) ... but whether it matters depend on the context.


soegaard2
2019-10-25 13:51:16

Btw variance is in math/statistics.


samth
2019-10-25 13:53:29

yes, it’s definitely much faster to use in-vector


samth
2019-10-25 13:54:29
> (define v (make-vector 1000000 1.0))
> (time (for/sum ([i v]) i))
cpu time: 40 real time: 40 gc time: 0
1000000.0
> (time (for/sum ([i (in-vector v)]) i))
cpu time: 16 real time: 16 gc time: 0
1000000.0

luis.osa.gdc
2019-10-25 13:55:50

ah, didn’t know either about variance in the standard lib, nor about in-vector being faster … good to learn!


luis.osa.gdc
2019-10-25 13:59:23

@samth I already have a profile on the program, done with raco profile, but I have a bit of a hard time locating where the bottleneck is … probably it’s in the variance function, as pointed out by @soegaard2, but I don’t understand really the references of the profile to the source code


samth
2019-10-25 13:59:37

can you post the output?


luis.osa.gdc
2019-10-25 14:02:44

Here is the output of profile on my main.rkt


samth
2019-10-25 14:04:32

@luis.osa.gdc can you run raco make -v main.rkt and then re-run the test?


luis.osa.gdc
2019-10-25 14:06:42

here are both outputs …


luis.osa.gdc
2019-10-25 14:07:34

This is run on macOS 10.14 and Racket 7.4, btw


luis.osa.gdc
2019-10-25 14:08:49

I hope I am not spamming other people … I have tried to post outputs as snippets and not as literal code blocks, since this is the general channel


luis.osa.gdc
2019-10-25 14:12:17

and, for completeness, this run is made on a sample of 100 lines from the total dataset; that is why it actually finishes training


samth
2019-10-25 14:12:51

ah, raco profile isn’t running the main submodule


samth
2019-10-25 14:13:56

how long should running main.rkt take?


luis.osa.gdc
2019-10-25 14:16:29

without profiling it just takes 1.11 seconds for the sample; I don’t really have a requirement for how long should it take, but not incredibly long …


samth
2019-10-25 14:17:14

for me it takes forever


samth
2019-10-25 14:17:27

is there something i should change in main.rkt?


luis.osa.gdc
2019-10-25 14:18:42

I did take a sample of 100 lines, like this: head -n 100 data/train.csv > data/train.sample.csv and then wrote that file name in main.rkt: (define S (io:parse-file "data/train.sample.csv"))


samth
2019-10-25 14:20:20

ok, I changed your code to use variance from math/statistics and to use in-vector and that now takes 1 ms


samth
2019-10-25 14:20:54

but 200 is still very slow


samth
2019-10-25 14:21:29

should i change the number in main.rkt to 200?


samth
2019-10-25 14:21:49

changing that number to 200 makes it finish instantly


samth
2019-10-25 14:22:23

same with 1000 lines and 1000 in main.rkt


luis.osa.gdc
2019-10-25 14:22:52

ah, in fact I introduced that depth argument in order to experiment around — the training was taking forever


luis.osa.gdc
2019-10-25 14:25:16

that looks solved for me! if you could still tell me a bit how to interpret the profiler’s output, I would be thankful, @samth


samth
2019-10-25 14:25:44

the profiler output is just showing you the results of loading the module, I think


luis.osa.gdc
2019-10-25 14:25:53

I need to leave now, but you can maybe comment on the snippets above, or something …


samth
2019-10-25 14:26:07

so it’s just a bunch of things internal to racket module loading


samth
2019-10-25 14:26:13

which is why it wasn’t useful


luis.osa.gdc
2019-10-25 14:26:16

it looks quite cryptic …


luis.osa.gdc
2019-10-25 14:26:26

thanks for any explanation! I will read that


luis.osa.gdc
2019-10-25 14:26:54

and thanks to both for your insights! I learned a lot from this


krismicinski
2019-10-25 14:28:42

don’t feel bad, I had also forgotten about it and the reminder was helpful!


samth
2019-10-25 14:29:05

I can’t reproduce anything being slow with a 100-line file at all


gregor.kiczales
2019-10-25 16:45:35

I’m wondering if someone could look at https://github.com/racket/racket/issues/2758 this causes us real problems because we have to run our handin and autograder single threaded


samth
2019-10-25 16:47:37

@gregor.kiczales do you have code that demonstrates this problem (even if complicated)?


gregor.kiczales
2019-10-25 16:48:19

yes, but it’s a threading problem, so it only comes up under load.


samth
2019-10-25 16:49:11

sure, I’ll try to load-test it


gregor.kiczales
2019-10-25 16:51:41

ok, give me a bit to bundle it


leif
2019-10-25 18:03:35

Hey @mflatt Using syntax-binding-set doesn’t seem to work when the module your trying to ‘forge’ is the one your expanding:


leif
2019-10-25 18:03:40
#lang racket

(require (for-syntax syntax/location))

(define-for-syntax (forge-identifier modpath sym)
  (syntax-binding-set->syntax
   (syntax-binding-set-extend (syntax-binding-set) sym 0 modpath)
   sym))

(define x 42)

(define-syntax (m stx)
  (forge-identifier
   (module-path-index-join (quote-module-path) #f)
   'x))

(m)

leif
2019-10-25 18:04:19

I suspect this is because the macro expander is using eq? somewhere internally.


leif
2019-10-25 18:04:26

Do you have any ideas how to get around thisS?


mflatt
2019-10-25 18:11:17

While a module is being expanded, the module doesn’t really have a name, so (quote-module-path) doesn’t make sense. Can you use (variable-reference->module-path-index (#%variable-reference)) instead of (module-path-index-join (quote-module-path) #f)?


leif
2019-10-25 18:19:41

Okay, that worked, although it does require the library to be able to determine if its expanding a ‘self’ module rather than a different one.


leif
2019-10-25 19:44:53

Okay @mflatt One more question: Is there any security reason that the serialize and fasl systems can’t make paths relative if they go up a directory?


leif
2019-10-25 19:45:54

Like, if I have a module in /foo/bar/mod.rkt and I have another one in /foo/baz/mod2.rkt. I would like to describe the second one relative to the first one as ../mod2.rkt.


mflatt
2019-10-25 19:47:27

That’s the intent of the pair mode of relative-to: a path to make things relative to, and a second path to bound the region that relative paths can reach (i.e., the first path can be an extension of the second one).


leif
2019-10-25 19:48:58

facepalms.


leif
2019-10-25 19:49:01

oh right, thanks.


leif
2019-10-25 20:03:44

Hmm…although it doesn’t looks like it collapses the path first. Would that be a bug?


leif
2019-10-25 20:04:08

Like say:

#lang scratch

(struct foo (x y z)
  #:mutable
  #:property prop:serializable
  (make-serialize-info
   (λ (this)
     (vector (foo-x this) (foo-y this) (foo-z this)))
   #'foo-beam
   #t
   (or (current-load-relative-directory) (current-directory))))

(define foo-beam
  (make-deserialize-info
   (λ (x y z)
     (foo x y z))
   (λ ()
     (define f (foo #f #f #f))
     (vector f
             (λ (x y z)
               (set-foo-x! f x)
               (set-foo-y! f y)
               (set-foo-z! f z))))))


(serialize (foo 1 2 3)
           #:relative-directory (cons (build-path "/" "home" "leif" "." "test" "de.rkt")
                                      (build-path "/")))

leif
2019-10-25 20:04:23

But if I remove the "." at the end, it seems to work as expected.


mflatt
2019-10-26 01:16:20

The relative-path calculation is not currently meant to automatically collapse the path (and I doubt that would be a good idea, although I’m not 100% sure). You should collapse explicitly if that makes sense in your use.


willbanders
2019-10-26 02:28:09

Is there a way to identity types which are discrete? I’m trying to make a system for ranges which can convert something like an inclusive range [0, 9] to an inclusive-exclusive range [0, 10). Is there any feasible way to do this?