
@jryans has joined the channel

Maybe you could give both the actual and expected values to call-with-output-string
?

So for the expected value, something like (call-with-output-string (curry display "expected output string"))
? That’s just of the top of my head as a general idea; haven’t tried.

That relies on display
having an optional 2nd argument for the output port.

I’d like to formally announce the initial release of Tabular Asa:
https://pkgd.racket-lang.org/pkgn/package/tabular-asa
It’s a fast, efficient, immutable, column-oriented dataframe module. I’ve been building it for a while now, mostly because I do a lot of work with tabular data and I wanted to do a lot of that work in Racket instead of Python. I also wanted to simplify most operations; Pandas can be annoying for the most common things people want to do. Finally, I wanted something with simple code so those who wanted to understand how dataframes work could just look and realize it’s actually pretty simple.
It’s got support for tables, b-tree indexes (and scanning), generic sorting, joining (inner and outer), groupings, and aggregation. It can read and write CSV and JSON (columns, records, and lines).
I still have a few more features I want to add to it, but it’s in a good place for general use if others have a need for it.

It’s maybe a bit too clever. If I had many such tests I might try to write a little check-equal-output?
macro to DRY and hide any clever-ness.

I ended up doing a complete loop, basically: (define (test-r/w-loop x writer reader)
(let ([s (call-with-output-string (lambda () (writer x)))])
(check-equal? (call-with-input-string s reader) x)))
That’s simplified, but it did the trick.

I want to note the one place it isn’t fast: reading off disk. The CSV and JSON reading isn’t nearly as quick as other languages. This is an area I hope to correct in the future. It’s using csv-reading
and the default json
implementation right now.

so I built something extremely similar to this on top of alex-hhh’s data-frames: https://docs.racket-lang.org/sawzall/index.html
initially, your implementation seems cleaner, and occasionally it felt like an uphill battle because I don’t control the API that I built it on top of
maybe there’s room to integrate the two?

On the speed of json
, I haven’t noticed any perf issues with read-json
in my lang jsond
(https://pkgd.racket-lang.org/pkgn/package/jsond), but the only timings I took were runtime and read-json
runs at compile-time for the lang

I didn’t time JSON much, but I’ve been doing tests w/ CSV files in the 100’s of MB in size. Loading (on avg) is ~6x slower than Python, but once loaded into memory the operations are all nice and fast.

I have plenty of data in the GB that I won’t attempt to load as of yet using the current CSV parsing library that I’m using.

BTW, @ben.knoble - I saw your jsond package the other day on reddit and thought to myself “this would have been nice a week ago” when I was putting together some tests. :slightly_smiling_face:

haha, didn’t even know it made it to Reddit… power of the internet. I’m hoping to leverage it for the slack archive project, actually, since all the archives are JSON files.

@massung This looks very interesting. I’d love to read a blog post or two on it.

Is the name a pun on Tabula Rasa or does Asa stand for something?

I read it as Tabula Rasa in my head… Tabular “ay-ess-ay” is hard to say mentally

I see asa
and automatically think of triangles…

love to see all the work in data processing libraries from all you folks lately :smile:


@massung skimming through the docs, I see that the table builder type is a class, which is an interesting choice. what led to to that?

Mutable state

I find (mutable) structs to be annoying in Racket to work with. If racket’s struct
was more like CL’s defstruct
, that’d be a different story.

@soegaard2 - I plan on trying to put together some docs (mini book?) on how to do DB-like things efficiently. There were many reasons I had for building this package, but as an example code source for those docs was one.
And yes — it’s a play on “Tabula Rasa”. :slightly_smiling_face: