
Nice people of the Racket community! I do side projects from time to time in your language. The last one is an implementation of regression trees (a variation of decision trees, an ML algorithm). But I found that my training gets stuck in even the simplest Kaggle problem, and I do not understand why. If someone cares to give me advice, the code is here: https://github.com/logc/house-prices (the README has a longer explanation of what I attempted and how the code is organized).

@luis.osa.gdc have you tried running the profiler?

From vectors.rkt
: (define (var v)
(define (actual-var v)
(let ([n (vector-length v)])
(* (/ 1 (sqr n))
(for/sum ([i (in-range n)])
(for/sum ([j (in-range i n)])
(sqr (- (vector-ref v i) (vector-ref v j))))))))
(cond [(vector-empty? v) 0]
[else (actual-var v)]))
This makes the computation take O(n^2) time. It’s faster to use E[X^2]-E[X]^2 which takes O(n) to compute.

no, is it only on DrRacket? or is there a way to run it from command line?

yes, see the profile
collection

damn! :face_palm:


I am going to try that out, and let you know

thanks! I knew I was missing some of the tooling which I actually knew existed …

A style thing only, in io.rkt
you have: (define (maybe-string->number a-string)
(let ([maybe-number (string->number a-string)])
(if maybe-number
maybe-number
; else
a-string)))
You can write (or (string->number a-string) a-string)

In a few places you have: (for ([v vvs]) ...
at some point (I still think it is the case) it was faster to use: (for ([v (in-vector vvs)]) ...
but whether it matters depend on the context.

Btw variance
is in math/statistics
.

yes, it’s definitely much faster to use in-vector

> (define v (make-vector 1000000 1.0))
> (time (for/sum ([i v]) i))
cpu time: 40 real time: 40 gc time: 0
1000000.0
> (time (for/sum ([i (in-vector v)]) i))
cpu time: 16 real time: 16 gc time: 0
1000000.0

ah, didn’t know either about variance
in the standard lib, nor about in-vector
being faster … good to learn!

@samth I already have a profile on the program, done with raco profile
, but I have a bit of a hard time locating where the bottleneck is … probably it’s in the variance
function, as pointed out by @soegaard2, but I don’t understand really the references of the profile to the source code

can you post the output?

Here is the output of profile
on my main.rkt

@luis.osa.gdc can you run raco make -v main.rkt
and then re-run the test?

here are both outputs …

This is run on macOS 10.14 and Racket 7.4, btw

I hope I am not spamming other people … I have tried to post outputs as snippets and not as literal code blocks, since this is the general channel

and, for completeness, this run is made on a sample of 100 lines from the total dataset; that is why it actually finishes training

ah, raco profile isn’t running the main submodule

how long should running main.rkt take?

without profiling it just takes 1.11 seconds for the sample; I don’t really have a requirement for how long should it take, but not incredibly long …

for me it takes forever

is there something i should change in main.rkt?

I did take a sample of 100 lines, like this: head -n 100 data/train.csv > data/train.sample.csv
and then wrote that file name in main.rkt
: (define S (io:parse-file "data/train.sample.csv"))

ok, I changed your code to use variance
from math/statistics and to use in-vector
and that now takes 1 ms

but 200 is still very slow

should i change the number in main.rkt to 200?

changing that number to 200 makes it finish instantly

same with 1000 lines and 1000 in main.rkt

ah, in fact I introduced that depth
argument in order to experiment around — the training was taking forever

that looks solved for me! if you could still tell me a bit how to interpret the profiler’s output, I would be thankful, @samth

the profiler output is just showing you the results of loading the module, I think

I need to leave now, but you can maybe comment on the snippets above, or something …

so it’s just a bunch of things internal to racket module loading

which is why it wasn’t useful

it looks quite cryptic …

thanks for any explanation! I will read that

and thanks to both for your insights! I learned a lot from this

don’t feel bad, I had also forgotten about it and the reminder was helpful!

I can’t reproduce anything being slow with a 100-line file at all

I’m wondering if someone could look at https://github.com/racket/racket/issues/2758 this causes us real problems because we have to run our handin and autograder single threaded

@gregor.kiczales do you have code that demonstrates this problem (even if complicated)?

yes, but it’s a threading problem, so it only comes up under load.

sure, I’ll try to load-test it

ok, give me a bit to bundle it

Hey @mflatt Using syntax-binding-set
doesn’t seem to work when the module your trying to ‘forge’ is the one your expanding:

#lang racket
(require (for-syntax syntax/location))
(define-for-syntax (forge-identifier modpath sym)
(syntax-binding-set->syntax
(syntax-binding-set-extend (syntax-binding-set) sym 0 modpath)
sym))
(define x 42)
(define-syntax (m stx)
(forge-identifier
(module-path-index-join (quote-module-path) #f)
'x))
(m)

I suspect this is because the macro expander is using eq?
somewhere internally.

Do you have any ideas how to get around thisS?

While a module is being expanded, the module doesn’t really have a name, so (quote-module-path)
doesn’t make sense. Can you use (variable-reference->module-path-index (#%variable-reference))
instead of (module-path-index-join (quote-module-path) #f)
?

Okay, that worked, although it does require the library to be able to determine if its expanding a ‘self’ module rather than a different one.

Okay @mflatt One more question: Is there any security reason that the serialize and fasl systems can’t make paths relative if they go up a directory?

Like, if I have a module in /foo/bar/mod.rkt
and I have another one in /foo/baz/mod2.rkt
. I would like to describe the second one relative to the first one as ../mod2.rkt
.

That’s the intent of the pair mode of relative-to
: a path to make things relative to, and a second path to bound the region that relative paths can reach (i.e., the first path can be an extension of the second one).

facepalms.

oh right, thanks.

Hmm…although it doesn’t looks like it collapses the path first. Would that be a bug?

Like say:
#lang scratch
(struct foo (x y z)
#:mutable
#:property prop:serializable
(make-serialize-info
(λ (this)
(vector (foo-x this) (foo-y this) (foo-z this)))
#'foo-beam
#t
(or (current-load-relative-directory) (current-directory))))
(define foo-beam
(make-deserialize-info
(λ (x y z)
(foo x y z))
(λ ()
(define f (foo #f #f #f))
(vector f
(λ (x y z)
(set-foo-x! f x)
(set-foo-y! f y)
(set-foo-z! f z))))))
(serialize (foo 1 2 3)
#:relative-directory (cons (build-path "/" "home" "leif" "." "test" "de.rkt")
(build-path "/")))

But if I remove the "."
at the end, it seems to work as expected.

The relative-path calculation is not currently meant to automatically collapse the path (and I doubt that would be a good idea, although I’m not 100% sure). You should collapse explicitly if that makes sense in your use.

Is there a way to identity types which are discrete? I’m trying to make a system for ranges which can convert something like an inclusive range [0, 9]
to an inclusive-exclusive range [0, 10)
. Is there any feasible way to do this?