Racket Slack Archive

gknauth

2021-8-9 12:43:01

People at work are doing a lot of Python data-sciency things with tables, data frames, parquet files, … I’ll have to see if this is useful in that context. The files we deal with are often in the 2–3GB range.

massung

2021-8-9 13:04:24

It’s definitely not ready for the multi GB sized files yet (and I’d love to support parquet files). I do TB-sized processing genetics at work, so if Racket can handle the load, I’m down for the challenge.

But, there’s a lot of things that need to happen for that to be possible as of yet. Most of those will deal with things like:

• Better file parsing • More conservative “reading” of CSV (e.g. when do you try and convert cell value -> number, date, etc.? • Parallel processing of column data when appropriate (e.g. aggregating groups) Consider the current version the entry with 90% features implemented, is only 2–6x slower than Pandas (except on data load, where it’s 10+x slower).

massung

2021-8-9 13:05:23

I have loaded and tested it with files in the 600 MB range (10M+ rows).

samdphillips

2021-8-9 14:43:02

IIRC the python csv library has an advantage in that most of the heavy lifting is in C

ben.knoble

2021-8-9 14:49:47

That sounds right to me; it’s the same with numpy (C and maybe Fortran?). Doing something similar here could make a difference.

samdphillips

2021-8-9 16:16:12

csv-reading isn’t the most performant it could be for racket. I have an experimental CSV reader (doesn’t handle all of the edge cases) that reads in 4kb buffers and operates on bytes (not strings) that’s ~4x faster. Benchmarked with 741MB file: $ raco run bench.rkt csv-reading < raw/systems_800.csv cpu time: 64015 real time: 64017 gc time: 163 5848000 $ raco run bench.rkt csv-reading < raw/systems_800.csv cpu time: 64422 real time: 64436 gc time: 166 5848000 $ raco run bench.rkt fast-csv < raw/systems_800.csv cpu time: 16907 real time: 16908 gc time: 226 5848000 $ raco run bench.rkt fast-csv < raw/systems_800.csv cpu time: 16850 real time: 16851 gc time: 227 5848000 I don’t think the csv-reading library does any number reading, but avoiding string->number can improve performance by avoiding unicode decoding and parameter lookups (these lookups can be avoided by setting all of the optional arguments in the call too) on each call.

samdphillips

2021-8-9 16:20:37

According to my notes I wrote it around Nov 2020 https://gist.github.com/a032514dc0093f00875922bc7c2c8b00

soegaard2

2021-8-9 16:23:02

Would it be worth to implement a bytes->number?

massung

2021-8-9 16:24:07

Yeah, CSV is tricky to do well and fast. Ideally, it wouldn’t even cons cells in a row either, as opposed to just being a sequence that returned cells in-order (as parsed) and a special symbol for 'end-of-row.

samdphillips

2021-8-9 16:24:10

Maybe? Some of these formats need custom readers.

samdphillips

2021-8-9 16:24:18

Like the JSON one is hand coded.

massung

2021-8-9 16:25:01

JSON can be significantly sped up as well by not being strict. For example, if you read t just assume true and skip 3 bytes.

sorawee

2021-8-9 16:32:01

This spam again :(

sorawee

2021-8-9 17:16:00

Feeling like READMEs in various repos should really be more descriptive. • The “description” field in every repo should be set. • There should be at least a sentence describing what the repo is for • Either link to Scribble doc, or have a lot more sentences to describe what it does I would love to help where possible, but my English sucks, so it might be easier for other people to do it.

massung

2021-8-9 17:16:55

> but my English sucks Your English is better than my <insert any other language here> :smile:

spdegabrielle

2021-8-9 17:18:27

I take it you mean repos linked the packages in the official package catalog?

sorawee

2021-8-9 17:18:59

Oh, I meant repos under racket organization. Sorry that was unclear

ben.knoble

2021-8-9 17:19:08

I agree in the large, but (at least with the last two packages I’ve published) I haven’t wanted to spend a lot of effort duplicating docs. I should add links though

spdegabrielle

2021-8-9 17:19:26

That’s tractable

sorawee

2021-8-9 17:19:30

For example, in racket/scribble, you have:

> This the source for the Racket packages: “scribble”, “scribble-doc”, “scribble-html-lib”, “scribble-lib”, “scribble-test”, “scribble-text-lib”.

spdegabrielle

2021-8-9 17:20:32

Description can’t be set by a pr - on the repo owners card do the description

sorawee

2021-8-9 17:20:54

Someone who doesn’t know Scribble would have a hard time to figure out what Scribble is.

shu--hung

2021-8-9 17:21:26

READMEs are in the individual package directories actually

shu--hung

2021-8-9 17:22:12

oh wait I am confused by scribble

shu--hung

2021-8-9 17:23:09

ah I see

sorawee

2021-8-9 17:24:15

My question originates from not knowing what https://github.com/racket/pkg-push does. After reading the code, I think it’s a system to convert packages in the previous package system to the new one, but I’m still not sure. It would save me a lot of time if this is clear from the README.

spdegabrielle

2021-8-9 17:26:33

Is Something like this ok? https://github.com/racket/scribble/pull/311\|https://github.com/racket/scribble/pull/311

soegaard2

2021-8-9 17:29:04

Mor -> more

sorawee

2021-8-9 17:29:37

Yes, something like that.

shu--hung

2021-8-9 17:29:47

CIs are gonna fail until the next snapshot build is up..

sorawee

2021-8-9 17:31:00

welp

spdegabrielle

2021-8-9 17:31:09

Fixed

spdegabrielle

2021-8-9 17:33:28

I don’t know what pkg-push does either :sob: - there are no docs or readme but it is only ~300 lines - if I go hunting I can probably work it out from the pkg that requires pkg-push, but I’m cooking dinner now. Or meant to be

shu--hung

2021-8-9 17:34:33

pkg-push is >= 7 years old. Probably some old code that is not in use

shu--hung

2021-8-9 17:37:06

Small suggestion: this renders a little better on Github # Scribble: The Racket Documentation Tool Matthew Flatt and Eli Barzilay Scribble is a collection of tools for .... (sorry I don’t have the merge permission)

spdegabrielle

2021-8-9 17:44:52

I agree with @sorawee in general, but not for pkg-push - packages like that probably just need a generic pointer readme to the racket homepage; > Welcome! You probably don’t want this package :grinning: > If you are interested in Racket check the homepage at https://racket-lang.org\|https://racket-lang.org for downloads, documentation, libraries and community - where you are welcome to ask questions. >

spdegabrielle

2021-8-9 17:57:25

https://github.com/racket/pkg-push/pull/1\|https://github.com/racket/pkg-push/pull/1

ayushhh.sh

2021-8-9 18:38:39

@ayushhh.sh has joined the channel

ben.knoble

2021-8-9 18:51:00

I’m pretty sure even with GHFM you need a blank line after headers, or things get confused. At least this was true in the last couple years

rokitna

2021-8-9 19:15:32

There’s something I’m looking for in the docs but can’t recall the name of. I seem to recall there’s a feature that lets errortrace-compiled modules coexist alongside other modules, but is general enough that other tools like errortrace could use it too. I think it takes a string as an argument so that it can be incorporated into the filename. Does someone know what I’m looking for?

laurent.orseau

2021-8-9 19:24:40

Do you mean changing the default compiled directory, like ddracket does?

spdegabrielle

2021-8-9 20:20:25

Tweaked it a bit more.

hazel

2021-8-9 20:53:16

all my Racket repos have the README: NAME A Racket library for X. This software is under rapid development (if necesssary)

INSTALLATION raco pkg install the-package

DOCUMENTATION <links to Scribble docs>

hazel

2021-8-9 20:54:12

which is pretty much the bare minimum IMO

rokitna

2021-8-9 21:29:56

That sounds like it could be exactly what I’m looking for. :D Do you know of a link to that?

laurent.orseau

2021-8-9 21:32:48

maybe this: https://docs.racket-lang.org/reference/eval.html#%28def._%28%28quote._~23~25kernel%29._use-compiled-file-paths%29%29

rokitna

2021-8-9 21:33:14

I was about to paste the same URL :D

rokitna

2021-8-9 21:33:26

I think this is exactly it, thank you so much

shu--hung

2021-8-10 02:37:34

Does Syntax Parse Bee accept illustration-only examples? That is, • Non-practical examples that illustrate a specific syntax of syntax-parse • Non-examples that show possible error messages of a specific syntax • Examples that partly overlap with the existing ones in the documentation of syntax-parse

zachmclark

2021-8-10 03:15:47

@zachmclark has joined the channel