
I wrote some data intensive applications in Racket and I was happy with the overall experience. My data sets are up to a few hundred MB in size, so they are not that big… What sort of scientific computation do you have in mind?

Montanari made a survey last year: https://www.youtube.com/watch?v=KWFfqQLjL_w

Some ODE solvers, optimizer routines to start with. But eventually an eco system similar to Numpy/Scipy.

Hinsen has some blog posts on the subject: https://khinsen.wordpress.com/2014/05/10/exploring-racket/

For numerical linear algebra, look at flomat
.

There are some “table” packages (like in R) available.

Also, there is a science collection (I think it is on the package server too?): http://planet.racket-lang.org/display.ss?package=science.plt&owner=williams

The science collection has some solvers: http://planet.racket-lang.org/package-source/williams/science.plt/4/8/planet-docs/science/ode.html

Even though it’s not my favourite language I have been almost exclusively been using python for scientific computing (besides a bit of wolfram from time to time). The only reason is that biologists and mechanical engineers are hopeless doing anything that might involve the terminal so I can send them google colab links that they can play with without the frustration of teaching them how to install anything on whatever version of windows they are using.

also, because these projects are not my day job, I want to be as replaceable as possible

these maybe dont apply to you but just throwing it out there

The science collection looks good but seems not be in active development for a long time.

I expect it to work though. Maybe there is a newer version somewhere?

A related question: why does it seem like racket
has very little presence in the natural science community? Is there a good reason for that? for example, a newer language like Julia
looks to be much more popular and a possible replacement for python
for many.

I imagine it’s just familiarity and inertia — a lot of people in the sciences use Python because it’s what their classes/labs/colleagues use, there’s a big pre-existing ecosystem with a lot of tutorials and guidance and prewritten code out there, and there’s no perceived benefit from striking out on their own and trying a different language because Python meets their needs already

Julia’s had a swell in popularity over the past couple years, but (1) Julia’s syntax was purposefully designed to feel familiar to people coming from Python and Matlab, (2) there’s a big community effort to provide drop-in replacements for popular Python packages and (3) the tide is receding somewhat as people hit growing pains in Julia and return to Python

for the typical Python user in natural science, Racket has the additional hurdle that it’s functional-first when they’re most likely used to imperative programming; I know other languages like F# that have been trying to branch into data science uses have been facing that issue with adoption too

@thechairman mentioned it but in my experience similarity with Matlab takes a language a very long way. For many non-computer scientists (and even data science people) python is little more than a wrapper around numpy and co.

@thechairman can you elaborate on your point #3 above re: the tide receding somewhat? I’m interested in this thread because Racket is my primary language, but I’m beginning to dabble in some math/data science, and I had tentatively identified Julia as a possibility if I need to go beyond Racket.

just that I’ve noticed a trend of people trying Julia but eventually going back to Python or Matlab because it involves less reinventing of the wheel or convincing colleagues to switch

lots of “I love Julia but I have to use Python”

Gotcha. I can see that. I realize the ecosystem is super important, but I’m taking a long term view, so I’m more interested in fundamentals and the suitability for building upon. I think Julia beats Python in this regard, but that may not be enough of course :)

yeah, it’s just institutional inertia a lot of the time

it took forever for python to reach its apex in the same way, so many people who refused to move away from fortran or c++

I only have room for one language to be really passionate about, and interested in contributing to and improving, and that’s Racket. For the math/data science stuff, I’d be more of a consumer, so my attitude is “what can it do for me?”. Now, if I can do all the math/data science stuff I need in Racket, that would change somewhat.

yeah, it’s getting there, stuff like sawzall and flomat are pretty mature, just that the audience for them is “people who already use racket and want to do data science” because the majority of people whose first priority is data science will land on python or R or matlab or SPSS or whatever their advisor/department head/manager recommends

the whole chicken-and-egg problem

Is there a function or a raco command to find a reverse dependencies of a package from a catalog?


@stuartmclu has joined the channel

I do think there’s something to that observation about functional. I’m a data scientist for my day job, and I use a mix of Python and Scala. What I’ve found is that, even in Scala, the majority of my code is more comfortably written using a procedural/imperative idiom. I tend to only stick to functional programming when I’m implementing actual algorithms. Which is rare; most of that I prefer to get from libraries.

Like, if y’all are familiar with Spark, the framework itself is quite functional - it arguably needs to be. But, in my actual day to day, I’m just taking those functional components and bunging them together with imperative-style code.

It honestly looks a lot like the model for computing that John Backus proposed at the end of his Turing award lecture, now that I think of it.

yeah, pretty much — a lot of the things that attract people to lisp-like languages or functional programming in general just don’t really matter or don’t fit into the traditional workflow for use cases like that

I have this package and I have a bunch of data I want to do something with, so I don’t care about the elegance of immutability or hygienic macros or whatever else

I just spent a few hours on an elaborate procrastination exercise to re-visit whether Julia makes sense for me as a possible addition to Racket. As much as I want to like Julia, my current conclusion is that if I want to “get stuff done” right now, in the data science space, then Python is probably the right choice. I think the ecosystem (including instructional material) wins for me.

Regarding instructional material, here are some good examples:

I wanted this a couple weeks ago. I was trying to understand how many packages depend on gui-lib
, and why (i.e. including transitively).

I started to write some ugly quadratic-slow hack query, before I had to set it aside to work on some other things for awhile.

Thinking more on this, where Racket seems better positioned than any other lisp to excel in the data space is that you can easily #lang together a syntax that analysts would prefer.
The other thing that I would say is absolutely essential is to build a Racket equivalent of Numpy and get the whole community to agree to use it as a lingua franca that all the other libraries can use to interoperate with each other. I’m pretty sure that’s Python’s secret weapon. I’m pretty sure the primary reason why Java (including Scala) has so thoroughly lost its position of prominence in this space is that, without an agreed-upon standard for storing data in-memory that everyone could share, doing data and numerical computing on the Java platform is like being adrift in the middle of an endless sea of glue code.

Building it on top of Arrow would be a particularly nice touch.

Arrow?

Oh, probably Array

See the pkg-dep-draw pkg


I agree with this point that there should be an agreement on building an ecosystem like numpy
and scipy
. I was looking at the flomat
package and it looks nice but then one needs to convert a flomat
column or row to a vector/list in order to use in plot
. What would be nice is if these packages can work with each other without needing some extra steps to make them compatible with each other. Let me know if I’m missing something. But this is my understanding of current situation.

@seanbunderwood do you think an Arrow library in Racket would require a change to Chez Scheme (assuming [CS]) ?

I wonder if Arrow plays nicely with BLAS and LAPACK.

That I don’t know. It would be a consideration though.
One big reason to go with Arrow is that it lets you more easily bring in support an ecosystem of things that are sort of minimum requirements for a viable product nowadays (like working with Parquet files) without having to commit to doing it all yourself.

That “why turtl switched from CL to JS” article on the front page of Hacker News today seems like some good for thought. The more that has to be built from scratch, the harder it’s going to be to build enough of a community to get the whole thing off the ground.

And yeah, the arrow columnar format is SIMD-friendly, though I think it doesn’t offer much in the way of math intrinsically in its C++ library, so there might be some work involved in getting it all working.

But, e.g., PyArrow has a to_numpy()
function that converts an Arrow array to a Numpy ndarray without copying any of the underlying data. Which implies that the numpy intrinsics are applying BLAS/LAPACK functions directly to the data without having to marshal it around first.

@arifshaikh.astro,I think math/array
and the rest of the Racket math libary would be a good starting point for building scientific libraries in Racket… Although reading the other comments here, it seems that most people would want 1:1 compatibility with numpy
, which is a big task…
Put it differently, if you want to use numpy
, than the best numpy
implementation I can recommend is numpy
:grinning:

Just tried it and it’s a cool tool. However unless I’m being dense (likely) it seems to show dependencies — other packages needed by a package. Which is great! However (although I’m not sure what @capfredf meant by “reverse”) I meant dependents instead of dependencies — other packages that need a package.

(I was curious about things that would cause gui-lib
to need to be installed, and in some cases I was surprised. So I wanted see/explore that space.)

You can ask it to show reverse dependencies

It does however only consider packages that you have installed

When I tried --reverse
or the GUI reverse checkbox, it showed the same graph of dependencies (not dependents). The nodes were the same, it just drew the lines between them differently.

Did you mouse over things?

It always shows the packages that are installed

If I do it for, say, pict-lib
: The graph nodes are always things needed by pict-lib
(dependencies) — never things that need pict-lib
(dependents). Hovering causes lines to appear. Toggling the “reverse” checkbox changes what hover lines are drawn between nodes — but it’s always the same nodes (dependents).

I think you want to just not specify a package to start from and then use the GUI to look at the reverse deps