
Does anyone know of any utility in racket
that a maps a value in 0-1
to a (r, g, b)
for a given color map
?

If you have 3 values, each in [0, 1], probably you just need to multiply by 256 (and maybe take the exact-floor
of this).
If you have a single value in [0, 1], then I’m not sure there’s a standard representation for this? Though I guess it could be r/256 + g/256² + b/256³ , or the reverse order of this

Or instead of 0–1 (which I interpreted as [0, 1]), maybe you mean a color index for plot
, and you want its r, g, b value?

If the latter, you probably want https://docs.racket-lang.org/plot/utils.html#%28def._%28%28lib._plot%2Futils..rkt%29._-~3epen-color%29%29

I thought @laurent.orseau was gonna offer an advice on space filling curve :stuck_out_tongue:

Oh! I see. Thanks! The resulting GUI is busy but manageable for a modest number of locally installed packages.

@soegaard2 what is the relationship, if any, between math/matrix
and flomat
?

Oh yeah I can do that too :smile: Or maybe a discrete probability distribution over all the numbers that can be represented as finite binary numbers in [0, 1]? (and I guess you could well define the latter based on space filling curves!)

I looked again at my hasty/abandoned query, and realized the problem was a cycle due to base
depending on racket-lib
and vice versa.

(For my own immediate purpose it would be fine to handle this by just excluding base
, I’m interested in things above that level.)

My choice of Racket as my company’s primary language was based on a long term view, so I might as well take a long term view in the data science space :) I’m very much a newbie though, so I think I’ll get experience with the Python ecosystem to get the lay of the land, and then try and identify the best way to help move Racket forward in this area.

None (besides the author :slightly_smiling_face: ).
Since math/matrix is implemented in Racket (using arrays from math/array) it handles matrices over both integers (bignums), rationals and floating points. All algorithms are implemented in Racket.
The matrices in flomat are basically represented as vectors of floating point layed out in the way BLAS/LAPACK expects. Almost all functions simply call the BLAS/LAPACK function.

The problem I am trying to address is the following: I have a 2d scatter plot and now I want to use a colorbar created using a linear-gradient%
. How do I now assign colors from this linear-gradient%
to the scattered points depending on their value from a 3rd list. I was thinking on normalizing the values in the 3rd list to [0-1]
and then using a function which gives a (r, g, b)
for a value from the linear-gradient%
. Let me know if there is a better way to do this.

I realize I’m at the stage where I very much don’t know what I don’t know, but it seems to me that for data science & machine learning, floating point matrices would be sufficient, no?

Yes. I believe so.

If x is your value in [0, 1], how about r = floor(x*256) g = floor((1-x)*256) b = 127

Hm, maybe your problem is the other way round. I’m a little confused.

What’s the 3rd list?

For example, I want to show z
coordinate as a color in my scatter plot of the (x, y)
coordinates given a list of X, Y, Z

Ok. Then you should skip the linear-gradient% (from which it’s hard to query values) and use a formula like the one I suggested above

Or: r = g = b = floor(x*200) if you want grayscale

(while avoiding the whites, since your background is likely white)

hmmm, let me see how this looks.

Alternatively, if you want rainbow color, you can use:
(define (convert x)
(define c (hsv->color (hsv x 1 1)))
(values (send c red) (send c green) (send c blue)))

Here’s an example


Terrible in practice since 0 and 1 have similar color. Perhaps you might want just a half of it

Thanks. But yeah, I need a color-gradient that shows the min and max values nicely. Rainbow colors won’t be appropriate.

Ah, too bad, I had just made the function manually: #lang racket
(require pict
racket/draw)
(apply
hc-append
(for/list ([x (in-range 0 1 1/100)])
(define r (exact-floor (* 255 x)))
(define g (exact-floor (* 255 (abs (* 2 (- x 1/2))))))
(define b (exact-floor (* 255 (cond [(< x 1/3) (* 3 x)]
[(< x 2/3) (- 1 (* 3 (- x 1/3)))]
[else (* 3 (- x 2/3))]))))
(filled-rectangle 2 10
#:color (make-object color% r g b)
#:draw-border? #false)))


Or if you replace x
with (- 1 x)
for red:

Is it possible to call Python code directly from Racket? If so, would you mind posting a simple example?

More colors on the same span: #lang racket
(require pict
racket/draw)
(apply
hc-append
(for/list ([x (in-range 0 1 1/400)])
(define r (exact-floor (* 255 (- 1 x))))
(define g (exact-floor (* 255 (- 1 (abs (* 2 (- x 1/2)))))))
(define b (exact-floor (* 255 (cond [(< x 1/4) (* 4 x)]
[(< x 1/2) (- 1 (* 4 (- x 1/4)))]
[(< x 3/4) (* 4 (- x 1/2))]
[else (- 1 (* 4 (- x 3/4)))]))))
(filled-rectangle 2 10
#:color (make-object color% r g b)
#:draw-border? #false)))

apart from system or process and friends?

There were a #lang python
at some point - but numpy and friends are not implemented in Python. They are wrappers over C code (mostly).

@mflatt I got an error when using pkg-build
with docker bin/racket: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.29` not found (required by bin/racket)
the docker image, I think, is https://hub.docker.com/r/racket/pkg-build, which is based on 16.04. Looks like we need to use a newer version of ubuntu

Yes, apart from those. I’m familiar with using those as a workaround, but I wasn’t sure if there was a more direct way.

If I understand, you built Racket on a newer Linux installation, and so it doesn’t run on older installations like the Docker image suggested by the pkg-build
docs. You could either create a different Docker image to use (probably starting with a Dockerfile
in the pkg-build
package) or build a Racket distribution through a Docker container running an older Linux (like “racket/distro-build:unix-installer-test”).

Ah, yes, the Racket was built on 20.04

Thank you very much

If you have lots of short calls to python, one (non-trivial) solution is to setup a server in python and communicate via channels. That would avoid the cost of starting a python program each time, but of course it’s substantially heavier to setup


Dates and strings are also in R/julia/Python data tables. I’d say without them you won’t attract any data scientists, because time series and categorical data are also huge use cases. Text manipulation, too. And not having a unified library and data structure that can handle all of it would be a big inconvenience.

That’s part of why I was pushing for building on top of Arrow. Just handling numbers is a decent 80% solution that probably covers most scientific computing, but the other 20% is still critical for users more in the data science space, and I think it’s probably better to take the up-front hit on supporting that stuff from the ground up than it is to go for the quick easy gains but risk getting painted into a corner.

Or, to put it more pithily, the world seems to have largely moved on from Fortran, and now wants Pandas.

there’s the sawzall package that handles a lot of that kind of mixed tabular data

very reminiscent of the tidyverse R stuff

@seanbunderwood I hear what you’re saying re: data tables, but can Python data tables be used directly for linear algebra / general matrix stuff? Or do people copy data out of data tables into something to do linear algebra on?

I think the first step for me personally is to simply get started with some basic data science stuff to see what exists, and what’s missing. Then I’ll solicit input from others about possible options for moving forward.

From <https://www.machinelearningplus.com/data-manipulation/101-python-datatable-exercises-pydatatable/|this page> it looks like copying is probably done w/ Python data tables: import datatable as dt
df = dt.fread('<https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv>')
# to pandas df
pd_df = df.to_pandas()
# to numpy arrays
np_arrays = df.to_numpy()
# to dictionary
dic = df.to_dict()
# to list
list_ = df[:,"indus"].to_list()
# to tuple
tuples_ = df[:,"indus"].to_tuples()
# to csv
df.to_csv("BostonHousing.csv")

Looking forward to @hazel’s talk at RacketCon !

Just noticed that @alexharsanyi , from this thread, created the data-fram
package that sawzall
is built-upon. Looks like there some nice building blocks already in Racket.

There is not necessarily any copying of the actual data, though you can if you want to. Typically what happens, though is that you’re just getting two different objects that point to the same underlying data.

yeah, the blocks are mostly all there, they just need some more glue imo

the ideal would be a nice self-contained system like R tidyverse or F#’s FsLab

That is specifically how it works with functions like to_pandas or to_numpy. Going to a list of dicts or tuples is a bigger operation because you’re moving the data to a fundamentally different layout in memory.

@seanbunderwood just so I understand, by “There is not necessarily any copying of the actual data”, are you saying that, in Python, one can load a datatable with a mix of numbers, dates, strings, etc., and then perform linear algebra on a subset of that datatable, without copying data? That sounds like magic to me :)

Maybe @alexharsanyi and @soegaard2 can comment on the possibility of doing something similar from data-frame
to flomat
. I guess at a minimum, data-frame
would need to store columns contiguously.

Seems like it would be necessary to either not have a header row in the datatable, or have the header row be a separate datastructure from the main array.

Feedback welcome: https://pkgd.racket-lang.org/pkgn/package/scribble-lp2-manual

Why does the racket-doc
package depend on the drracket
package?

That seems counter-intuitve.



It looks like that happened about 5 years ago to update the style guide in https://github.com/racket/racket/commit/5f5fc0935d88abf389e465c7e34d61e351dd2e8a#diff-0802933065577058b09c5a205086d9284177f05251729e55c8c9e4212517c0cb

I guess one reason I ask is that drracket
is a somewhat “heavy” dependency, for example it pulls in the gui-lib
package.

The racket-doc
package also directly depends on gui
, though. I vaguely remember being concerned about the drracket
dependency, but discovering that it didn’t much matter. then again, I may misremember.

See also: https://github.com/racket/racket/pull/3215 which removes the DrRacket dep

I would like to merge that PR, or something like it, but there are two problems: 1. There’s no way to do indirect references to identifiers. Creating one seems hard. 2. @mflatt didn’t like the scribble/docnames
approach, which moves some information that should be in eg the drracket
package into racket-doc
. I think if (1) was fixed, though, we could persuade him.

The representation used in flomat
is: ; BLAS/LAPACK represents matrices as one-dimensional arrays
; of numbers (S=single, D=double, X=complex or Z=double complex).
; This library uses arrays of doubles.
(define _flomat (_cpointer 'flomat))
; The array is wrapped in a struct, which besides
; a pointer to the array, holds the number of
; rows and columns. Future extension could be to
; allow different types of numbers, or perhaps
; choose specialized operations for triangular matrices.
...
; m = rows, n = cols, a = mxn array of doubles
; lda = leading dimension of a (see below)
(struct flomat (m n a lda)
#:methods gen:custom-write
[(define write-proc flomat-print)]
#:methods gen:equal+hash
[(define equal-proc
(λ (A B rec)
(and (= (flomat-m A) (flomat-m B))
(= (flomat-n A) (flomat-n B))
(or (equal? (flomat-a A) (flomat-a B))
(flomat= A B epsilon)))))
(define hash-proc
; TODO: Avoid allocation in hash-proc.
(λ (A rec)
(define-param (m n) A)
(rec (cons m (cons n (flomat->vector A))))))
(define hash2-proc
(λ (A rec)
(define-param (m n) A)
(rec (cons n (cons m (flomat->vector A))))))])

Oh I wasn’t aware of the context and the PR. I’ll check that out.

@soegaard2 is (flomat-a)
a vector
? If so, what is the type of the elements of the vector? I expect BLAS/LAPACK are not expecting boxed values, so it must be in an IEEE 64-bit floating point format, right?

It’s a pointer to a piece of memory with floating points.
This allocates memory for an mxn matrix.
(define (alloc-flomat m n)
(if (or (= m 0) (= n 0))
#f ; ~ NULL
(cast (malloc (* m n) _double 'atomic)
_pointer _flomat)))

> The racket-doc
package also directly depends on gui
, though. Well, derp. I see that now. Although that does makes me wonder why it would depend on gui
as opposed to just gui-doc
, I’ll see if I can figure out why.

Oh! Wow. Ok, so to be able to convert from a data-frame
to flomat
seamlessly, it seems data-frame
would have to use the same type of memory layout, and that seems very unlikely :)

The next Rhombus meeting is happening today, details available here: https://github.com/racket/rhombus-brainstorming/discussions/180

(Maybe for reasons similar to what Sam just said. I’ll marinade myself in the PR…)

Yes.

I have no idea how accepted the Arrow memory layout is, but hypothetically, I wonder how hard it would be to convert flomat
to use that since you’re managing memory so directly already. Maybe it’s already close to it.

I think I now understand a little more clearly what @seanbunderwood was trying to convey :) It does seem like a data frame that uses the same memory layout as flomat
would be very handy to avoid having to copy tons of data. My main goal would be to be able to have zero-copy within Racket code, but if we could get interop with other languages also, that would be a win.

Also to be fair I’m not sure that gui-lib
is even all that “heavy” per se. I think I’m sensitized to it because, if someone does end up installing it on a headless server, it sets the stage for various awkward scenarios.

The lda
can be used to make “gaps” between rows.

It’s normally used to make a submatrix that shares the memory with a larger matrix.

Potentially one could store other information in such gaps - but I think it would be a hassle to work with.

Yeah, I was just (naively?) thinking that if you had a data frame like: 2021-10-24T12:00:00 12.34 32.54
2021-10-24T12:00:01 67.43 28.98
2021-10-24T12:00:02 42.97 31.21
you would be able to “convert” the 2x3 number portion to a flomat array w/o copying.

Because they would both be viewing memory as: 12.34
67.43
42.97
32.54
28.98
31.21

I suppose that would bring up memory ownership issues, but in a worst case, a memcpy
of the contiguous data would probably be much quicker than a loop.

Also, a memcpy in order to get set up for a matrix multiply may not be a big deal in practice. Multiplying matrices is an embarrassingly small percentage of where working data scientists actually spend their time. Maybe 90% of it is data cleaning, prep, EDA, stuff like that. And that’s a spot where the bottleneck is how fast humans can work, not CPUs or memory controllers.

Like, I think that’s where Breeze failed on the JVM. They focused too much on making the math efficient & powerful, but the ergonomics are weak, so, when I’m stuck using it, I end up feeling like it’s an uphill struggle to do my actual job.


You might also want to look at this: https://docs.racket-lang.org/colormaps/index.html

Is it possible to typeset something like > ;;; some comment with racketblock
and (code:comment _content_)
? @racketblock[
(code:comment ";;; something")
gives me: ; ;;; something

Thank for the notes. Very appreciated.

I don’t think racketblock
offers a way to do what you want with code:comment
. You’d have to escape at the point where you want a comment and generate the output including a leading ;
.

I see, thank you

iowa?

Feed…corn… maybe Iowa was a bit too lateral

Got it. This is why I asked:

I ended up going in a different direction: https://docs.racket-lang.org/splitflap/index.html\|https://docs.racket-lang.org/splitflap/index.html

Anyways I’m a Minnesota guy. Couldn’t bring myself to name it Iowa

how do i get drracket to log stuff? i tried set PLTSTDERR info; drracket
but nothing is printing

DrRacket has a log viewer you can use