jryans
2021-8-8 07:07:34

@jryans has joined the channel


greg
2021-8-8 14:30:37

Maybe you could give both the actual and expected values to call-with-output-string?


greg
2021-8-8 14:31:21

So for the expected value, something like (call-with-output-string (curry display "expected output string"))? That’s just of the top of my head as a general idea; haven’t tried.


greg
2021-8-8 14:31:55

That relies on display having an optional 2nd argument for the output port.


massung
2021-8-8 14:32:17

I’d like to formally announce the initial release of Tabular Asa:

https://pkgd.racket-lang.org/pkgn/package/tabular-asa

It’s a fast, efficient, immutable, column-oriented dataframe module. I’ve been building it for a while now, mostly because I do a lot of work with tabular data and I wanted to do a lot of that work in Racket instead of Python. I also wanted to simplify most operations; Pandas can be annoying for the most common things people want to do. Finally, I wanted something with simple code so those who wanted to understand how dataframes work could just look and realize it’s actually pretty simple.

It’s got support for tables, b-tree indexes (and scanning), generic sorting, joining (inner and outer), groupings, and aggregation. It can read and write CSV and JSON (columns, records, and lines).

I still have a few more features I want to add to it, but it’s in a good place for general use if others have a need for it.


greg
2021-8-8 14:32:42

It’s maybe a bit too clever. If I had many such tests I might try to write a little check-equal-output? macro to DRY and hide any clever-ness.


massung
2021-8-8 14:34:53

I ended up doing a complete loop, basically: (define (test-r/w-loop x writer reader) (let ([s (call-with-output-string (lambda () (writer x)))]) (check-equal? (call-with-input-string s reader) x))) That’s simplified, but it did the trick.


massung
2021-8-8 14:36:08

I want to note the one place it isn’t fast: reading off disk. The CSV and JSON reading isn’t nearly as quick as other languages. This is an area I hope to correct in the future. It’s using csv-reading and the default json implementation right now.


hazel
2021-8-8 17:21:01

so I built something extremely similar to this on top of alex-hhh’s data-frames: https://docs.racket-lang.org/sawzall/index.html

initially, your implementation seems cleaner, and occasionally it felt like an uphill battle because I don’t control the API that I built it on top of

maybe there’s room to integrate the two?


ben.knoble
2021-8-8 17:26:36

On the speed of json, I haven’t noticed any perf issues with read-json in my lang jsond (https://pkgd.racket-lang.org/pkgn/package/jsond), but the only timings I took were runtime and read-json runs at compile-time for the lang


massung
2021-8-8 17:42:53

I didn’t time JSON much, but I’ve been doing tests w/ CSV files in the 100’s of MB in size. Loading (on avg) is ~6x slower than Python, but once loaded into memory the operations are all nice and fast.


massung
2021-8-8 17:43:26

I have plenty of data in the GB that I won’t attempt to load as of yet using the current CSV parsing library that I’m using.


massung
2021-8-8 17:47:55

BTW, @ben.knoble - I saw your jsond package the other day on reddit and thought to myself “this would have been nice a week ago” when I was putting together some tests. :slightly_smiling_face:


ben.knoble
2021-8-8 17:54:22

haha, didn’t even know it made it to Reddit… power of the internet. I’m hoping to leverage it for the slack archive project, actually, since all the archives are JSON files.


soegaard2
2021-8-8 20:39:28

@massung This looks very interesting. I’d love to read a blog post or two on it.


soegaard2
2021-8-8 20:45:22

Is the name a pun on Tabula Rasa or does Asa stand for something?


ben.knoble
2021-8-8 20:47:31

I read it as Tabula Rasa in my head… Tabular “ay-ess-ay” is hard to say mentally


soegaard2
2021-8-8 20:48:56

I see asa and automatically think of triangles…


notjack
2021-8-8 21:03:15

love to see all the work in data processing libraries from all you folks lately :smile:



notjack
2021-8-8 21:16:47

@massung skimming through the docs, I see that the table builder type is a class, which is an interesting choice. what led to to that?


massung
2021-8-8 21:36:56

Mutable state


massung
2021-8-8 21:37:42

I find (mutable) structs to be annoying in Racket to work with. If racket’s struct was more like CL’s defstruct, that’d be a different story.


massung
2021-8-8 21:39:53

@soegaard2 - I plan on trying to put together some docs (mini book?) on how to do DB-like things efficiently. There were many reasons I had for building this package, but as an example code source for those docs was one.

And yes — it’s a play on “Tabula Rasa”. :slightly_smiling_face: