
People at work are doing a lot of Python data-sciency things with tables, data frames, parquet files, … I’ll have to see if this is useful in that context. The files we deal with are often in the 2–3GB range.

It’s definitely not ready for the multi GB sized files yet (and I’d love to support parquet files). I do TB-sized processing genetics at work, so if Racket can handle the load, I’m down for the challenge.
But, there’s a lot of things that need to happen for that to be possible as of yet. Most of those will deal with things like:
• Better file parsing • More conservative “reading” of CSV (e.g. when do you try and convert cell value -> number, date, etc.? • Parallel processing of column data when appropriate (e.g. aggregating groups) Consider the current version the entry with 90% features implemented, is only 2–6x slower than Pandas (except on data load, where it’s 10+x slower).

I have loaded and tested it with files in the 600 MB range (10M+ rows).

IIRC the python csv library has an advantage in that most of the heavy lifting is in C

That sounds right to me; it’s the same with numpy (C and maybe Fortran?). Doing something similar here could make a difference.

csv-reading
isn’t the most performant it could be for racket. I have an experimental CSV reader (doesn’t handle all of the edge cases) that reads in 4kb buffers and operates on bytes (not strings) that’s ~4x faster. Benchmarked with 741MB file: $ raco run bench.rkt csv-reading < raw/systems_800.csv cpu time: 64015 real time: 64017 gc time: 163
5848000
$ raco run bench.rkt csv-reading < raw/systems_800.csv
cpu time: 64422 real time: 64436 gc time: 166
5848000
$ raco run bench.rkt fast-csv < raw/systems_800.csv
cpu time: 16907 real time: 16908 gc time: 226
5848000
$ raco run bench.rkt fast-csv < raw/systems_800.csv
cpu time: 16850 real time: 16851 gc time: 227
5848000
I don’t think the csv-reading
library does any number reading, but avoiding string->number
can improve performance by avoiding unicode decoding and parameter lookups (these lookups can be avoided by setting all of the optional arguments in the call too) on each call.

According to my notes I wrote it around Nov 2020 https://gist.github.com/a032514dc0093f00875922bc7c2c8b00

Would it be worth to implement a bytes->number
?

Yeah, CSV is tricky to do well and fast. Ideally, it wouldn’t even cons cells in a row either, as opposed to just being a sequence that returned cells in-order (as parsed) and a special symbol for 'end-of-row
.

Maybe? Some of these formats need custom readers.

Like the JSON one is hand coded.

JSON can be significantly sped up as well by not being strict. For example, if you read t
just assume true
and skip 3 bytes.

This spam again :(

Feeling like READMEs in various repos should really be more descriptive. • The “description” field in every repo should be set. • There should be at least a sentence describing what the repo is for • Either link to Scribble doc, or have a lot more sentences to describe what it does I would love to help where possible, but my English sucks, so it might be easier for other people to do it.

> but my English sucks Your English is better than my <insert any other language here> :smile:

I take it you mean repos linked the packages in the official package catalog?

Oh, I meant repos under racket
organization. Sorry that was unclear

I agree in the large, but (at least with the last two packages I’ve published) I haven’t wanted to spend a lot of effort duplicating docs. I should add links though

That’s tractable

For example, in racket/scribble
, you have:
> This the source for the Racket packages: “scribble”, “scribble-doc”, “scribble-html-lib”, “scribble-lib”, “scribble-test”, “scribble-text-lib”.

Description can’t be set by a pr - on the repo owners card do the description

Someone who doesn’t know Scribble would have a hard time to figure out what Scribble is.

READMEs are in the individual package directories actually

oh wait I am confused by scribble

ah I see

My question originates from not knowing what https://github.com/racket/pkg-push does. After reading the code, I think it’s a system to convert packages in the previous package system to the new one, but I’m still not sure. It would save me a lot of time if this is clear from the README.

Is Something like this ok? https://github.com/racket/scribble/pull/311\|https://github.com/racket/scribble/pull/311

Mor -> more

Yes, something like that.

CIs are gonna fail until the next snapshot build is up..

welp

Fixed

I don’t know what pkg-push does either :sob: - there are no docs or readme but it is only ~300 lines - if I go hunting I can probably work it out from the pkg that requires pkg-push, but I’m cooking dinner now. Or meant to be

pkg-push is >= 7 years old. Probably some old code that is not in use

Small suggestion: this renders a little better on Github # Scribble: The Racket Documentation Tool
Matthew Flatt and Eli Barzilay
Scribble is a collection of tools for ....
(sorry I don’t have the merge permission)

I agree with @sorawee in general, but not for pkg-push
- packages like that probably just need a generic pointer readme to the racket homepage; > Welcome! You probably don’t want this package :grinning: > If you are interested in Racket check the homepage at https://racket-lang.org\|https://racket-lang.org for downloads, documentation, libraries and community - where you are welcome to ask questions. >


@ayushhh.sh has joined the channel

I’m pretty sure even with GHFM you need a blank line after headers, or things get confused. At least this was true in the last couple years

There’s something I’m looking for in the docs but can’t recall the name of. I seem to recall there’s a feature that lets errortrace-compiled modules coexist alongside other modules, but is general enough that other tools like errortrace could use it too. I think it takes a string as an argument so that it can be incorporated into the filename. Does someone know what I’m looking for?

Do you mean changing the default compiled directory, like ddracket does?

Tweaked it a bit more.

all my Racket repos have the README: NAME A Racket library for X. This software is under rapid development (if necesssary)
INSTALLATION raco pkg install the-package
DOCUMENTATION <links to Scribble docs>

which is pretty much the bare minimum IMO

That sounds like it could be exactly what I’m looking for. :D Do you know of a link to that?


I was about to paste the same URL :D

I think this is exactly it, thank you so much

Does Syntax Parse Bee accept illustration-only examples? That is, • Non-practical examples that illustrate a specific syntax of syntax-parse • Non-examples that show possible error messages of a specific syntax • Examples that partly overlap with the existing ones in the documentation of syntax-parse

@zachmclark has joined the channel