Racket Slack Archive

mflatt

2017-10-9 14:45:19

@greg For the town-hall meeting, it was status reports and discussion – no decisions. The Racket-on-Chez report was that DrRacket now sort of runs, but it will probably take another year for things to run well. The pkgs report (Jay) was about plans to merge the doc and main-package views. The Typed Racket report (Sam) was improved refinement types. The typesetting report (MB) was that a pure Racket solution to generating PDFs is on the way, such as direct Scribble-to-PDF without Latex. I think I’ve missed one or two, so I hope others fill in. After that, we had tutorials on how to contribute, as listed on the schedule. Finally, people broke up into groups and work. I had to leave mid-afternoon to catch my flight, but it looked to me like a lot of good discussion taking place around busy keyboards.

apg

2017-10-9 14:52:02

Pkgs/docs: does that mean it wouldn’t link out to http://docs.racket-lang.org\|docs.racket-lang.org? If so, would docs hosting on docs. go away?

apg

2017-10-9 14:52:33

(I’m sure this was answered, so apologies for even asking)

samth

2017-10-9 14:53:37

@apg The idea is to continue hosting docs, as now, but have the front page of docs and/or pkgs be a general front-end to “finding Racket code” rather than just a documentation page or just a “find a package” page

samth

2017-10-9 14:56:26

Exactly what that page would look like remains to be seen, but it might be something like a cross between the current http://docs.r-l.org\|docs.r-l.org, the package search bar, and the “most common packages” that you see on the NPM front page

apg

2017-10-9 15:13:32

Ohh!! Ok. Thats great!

apg

2017-10-9 15:13:52

Thanks for the details @samth

kathryngray

2017-10-9 16:19:01

@kathryngray has joined the channel

apg

2017-10-9 16:30:20

the #irc channel is out of date again.

kathryngray

2017-10-9 16:30:30

Hi, is there a racket function or library that’s faster at splitting a string into a list of strings than string-split? (I’m processing a 2.5Mb text file, takes 2 seconds to read into a list of lines and takes 19 seconds to map string-split over those lines… it’s even worse with my 150+Mb files)

apg

2017-10-9 16:31:23

@kathryngray can you do it without reading the full list of lines first?

kathryngray

2017-10-9 16:31:59

Each line is independent except order matters

apg

2017-10-9 16:32:40

so, use something like for/list with in-lines

apg

2017-10-9 16:32:55

and then split-string on the generated line

apg

2017-10-9 16:33:08

you’ll at least save the 2 seconds.

apg

2017-10-9 16:33:24

(in-lines will lazily read during iteration instead of all up front)

apg

2017-10-9 16:33:37

the string-split then is the problem.

kathryngray

2017-10-9 16:34:14

The 2 seconds is trivial :slightly_smiling_face:

apg

2017-10-9 16:34:24

(but in the 150Mb case, it’s longer than 2seconds I’m guessing?)

kathryngray

2017-10-9 16:34:34

Yeah but it’s still less than 10

apg

2017-10-9 16:34:51

how big are the lines?

kathryngray

2017-10-9 16:34:51

Whereas the string splitting is still quite large

kathryngray

2017-10-9 16:35:22

They vary but on average 8 words

apg

2017-10-9 16:36:00

are you splitting with regexp? or just a string?

kathryngray

2017-10-9 16:36:22

I’m splitting with the default value to string-split

kathryngray

2017-10-9 16:37:06

(which matches ’ ’ but I don’t know if that’s done with a string or a regexp internall)

jaz

2017-10-9 16:37:24

default is #px"\\s+"

jaz

2017-10-9 16:39:52

you can probably get better performance by reading and splitting bytes instead of strings

apg

2017-10-9 16:40:22

@kathryngray I’m trying to get a big enough file to test if I suffer the same problem.

apg

2017-10-9 16:40:23

one second

kathryngray

2017-10-9 16:41:40

Cool :slightly_smiling_face: Sadly I have to leave the computer but I’ll check back

apg

2017-10-9 16:42:43

so, i’m doing a 5M file (the bible from project gutenberg) in 2.235s of real time

apg

2017-10-9 16:42:57

114567 lines of varying sizes

apg

2017-10-9 16:43:11

and that includes the printing of the result.

apg

2017-10-9 16:46:16

(and less than a second if i suppress output)

apg

2017-10-9 16:47:54

anyway, @kathryngray I don’t doubt that you’re seeing this, but I’m guessing there’s another explanation. Can you run http://pasterack.org/pastes/55271 this with your file and see what you get?

jaz

2017-10-9 17:05:49

@kathryngray assuming you don’t need to worry about non-ASCII whitespace, the following runs in about 2.5 seconds on a 230MB file on my machine (not printing the result, of course):

jaz

2017-10-9 17:09:51

oh, I didn’t break it up into lines first; not sure exactly what output you’re looking for — a list of lists of strings?

jaz

2017-10-9 17:43:55

This one splits strings per line. There are more optimizations possible, but this is reasonably fast already, despite all the reversing of lists going on.

leif

2017-10-9 17:52:40

@mflatt Is the text% editor implemented in Racket or does it use native text boxes (with a lot of Racket code on top)?

mflatt

2017-10-9 17:54:42

The text% class along with snips, implements all drawing and manipulation at the level of draw-text, on-char, and on-event.

mflatt

2017-10-9 17:54:53

So, no native text boxes.

kathryngray

2017-10-9 18:02:06

@jaz Thanks for the example, sadly it turns out my text input is not utf–8 (was news to me, I thought it was)

jaz

2017-10-9 18:02:58

you can use other encodings

jaz

2017-10-9 18:03:15

there’s a bytes->string/latin-1

jaz

2017-10-9 18:04:00

and if you need, say, WINDOWS–1252, you do that with bytes-open-converter, etc.

leif

2017-10-9 18:05:19

@mflatt Okay cool, that’s what I thought.

leif

2017-10-9 18:05:20

thanks. :slightly_smiling_face:

kathryngray

2017-10-9 18:08:14

Cut it to 6s on my 2.5Mb, still running the 150Mb

apg

2017-10-9 18:08:47

@kathryngray could you maybe share the relevant part of your program?

jaz

2017-10-9 18:08:50

@kathryngray are you running in DrRacket?

kathryngray

2017-10-9 18:09:38

Well the relevant bit of program is now the bit @jaz shared, and yes I’m running in DrRacket as I want to explore the data after I process it

apg

2017-10-9 18:10:07

@kathryngray ok.

jaz

2017-10-9 18:10:10

Did you turn off debugging/profiling?

jaz

2017-10-9 18:10:21

Because that imposes a heavy performance penalty.

jaz

2017-10-9 18:12:02

The options are in Language -> Choose Language, and you may need to “show details” if you haven’t in the past.

apg

2017-10-9 18:14:09

hmm. maybe this is a hardware thing? @kathryngray are you running this on a relatively recent machine? Or running in virtualization or something?

apg

2017-10-9 18:14:50

(even with debugging, @jaz, it still runs < 3 seconds for me)

jaz

2017-10-9 18:15:04

it could also be printing results, which is far more expensive than computing them

apg

2017-10-9 18:16:14

true. printing in drracket is extremely slow.

notjack

2017-10-9 19:10:56

knew I’d forget something on the trip to RacketCon this year; I forgot to get a shirt!

notjack

2017-10-9 19:11:18

Oops

jaz

2017-10-9 19:14:30

@apg I didn’t actually try the dead-simple version first; I had just assumed it was slower. Turns out it’s faster.

apg

2017-10-9 19:22:46

@jaz yay for doing the simple thing. :slightly_smiling_face:

apg

2017-10-9 19:25:55

I will say, one thing that I do happen to like about go is that they have equivalent packages that are very similar for functions to work on strings and []byte. Might be worth copying this if we can provide faster bytes-split for cases where that’s needed.

kathryngray

2017-10-9 20:31:13

@jaz and @apg This is running on a this-year model macbook pro, so hardware shouldn’t be much of an issue, and the only thing printing out during this phase is the time measurement (which means it is measuring time as well as working but doesn’t print out any of the file contents), but I haven’t changed any DrRacket settings other than to beef up the RAM to 4Gb (I’m processing the files one after another, so my memory footprint does get big by the second 150+Mb file) so I’ll try that tomorrow. Still 20s -> 6s helps me get to the more interesting exploration faster :slightly_smiling_face: So Thanks a bunch for the help

apg

2017-10-9 20:32:27

ooh! It’s multiple files in a row?

kathryngray

2017-10-9 20:32:39

Also printing in DrRacket is very slow and doesn’t seem to be interruptible if the processing has finished (I accidentally let the file contents enter the interactions window early on, and it seemed the only action to take to stop it was to force quit DrRacket)

kathryngray

2017-10-9 20:32:45

Four files in a row, shortest to biggest

apg

2017-10-9 20:33:16

ah. ok

kathryngray

2017-10-9 20:33:21

In that window, in another two files in a row. The initial reading time doesn’t seem to lag for multiple files although memory builds up

zenspider

2017-10-9 20:42:17

@kathryngray (time (void (do-the-thing)) can help a lot… but hit Cmd-L, “Show Details” and switch from “Debugging” to “No debugging or profiling”. It makes a huge difference

kathryngray

2017-10-9 20:43:14

Oddly I just ran it without debugging and it was faster… there’s a lot of GC time. Trying a fresh run of DrRacket

kathryngray

2017-10-9 20:46:54

I mean it was faster With debugging turned on

zenspider

2017-10-9 20:49:39

maybe it has a compile artifact w/ the debugging and it needs to recompile for non? I don’t know how that side works

apg

2017-10-9 20:50:23

i’d imagine the run button basically reloads the entire code window into the repl, no?

zenspider

2017-10-9 21:43:40

OK. I have a basic #lang lexer working… but one thing that confuses me is what functions are available… if my main.rkt implementing my hashlang is racket then things work fine… my lexer can use things like string-trim… but if I make it use racket/base then string-trim isn’t available EVEN IF I put in a (require racket/string) inside my #%module-begin.

What am I not getting?

zenspider

2017-10-9 21:46:07

#lang oedipuslex

digits = (:+ numeric)

(eof)                                                 : (return-without-srcloc eof)
"\n"                                                  : (token 'NEWLINE lexeme)
whitespace                                            : (token lexeme #:skip? #t)
(:or "print" "goto" "end" "+" ":" ";")                : (token lexeme lexeme)
digits                                                : (token 'INTEGER (string-&gt;number lexeme))
(:or (:seq (:? digits) "." digits) (:seq digits ".")) : (token 'DECIMAL (string-&gt;number lexeme))
(:or (from/to "\"" "\"") (from/to "'"  "'"))          : (token 'STRING (string-trim lexeme #px"."))

(so far…. I’d like to add more constructs)

zenspider

2017-10-9 21:47:27

possible phase 3:

;; TODO: phase 3: maybe?
digits = (:+ numeric)

(eof)                                      : (return-without-srcloc eof)
"\n"                                       : (token 'NEWLINE lexeme)
whitespace                                 : (token lexeme #:skip? #t)
"rem" ... "\n"                             : (token 'REM lexeme)
"print" \| "goto" \| "end" \| "+" \| ":" \| ";" : (token lexeme lexeme)
digits                                     : (token 'INTEGER (string-&gt;number lexeme))
digits? "." digits \| digits "."            : (token 'DECIMAL (string-&gt;number lexeme))
 "\"" .. "\"" \| "'" .. "'"                 : (token 'STRING (string-trim lexeme #px"."))

mflatt

2017-10-9 22:13:25

@zenspider If your #%module-begin introduces the (require racket/string), then the introduced bindings are not visible to references that are supplied to the macro. In other words, imports are hygienic the same as definitions. It’s usually easiest to have the oedipuslex reader to inject a require that has empty lexical context, so that it behaves as if it were present with the references.

lexi.lambda

2017-10-9 22:14:43

To add to what Matthew said, require introduces bindings with the same lexical context as the piece of syntax for the module name itself.

notjack

2017-10-9 22:15:59

I think it would be easier if you had your #lang lexer expand into (module anonymous lexer body …)

notjack

2017-10-9 22:16:40

That way whatever (require lexer) would export is what will be visible in a #lang lexer module

notjack

2017-10-9 22:16:51

But I haven’t done this in a while and my memory is fuzzy

apg

2017-10-9 22:17:02

aside: is there a reason modules can’t be, actually anonymous ?

apg

2017-10-9 22:17:16

slideshow/simple creates a module my-module which seems silly

notjack

2017-10-9 22:17:26

that part I’m fuzzy on

lexi.lambda

2017-10-9 22:17:31

modules are not first-class, so they can’t be anonymous in that sense

apg

2017-10-9 22:17:59

that’s explanation enough.

apg

2017-10-9 22:18:01

:slightly_smiling_face:

notjack

2017-10-9 22:18:05

can a lang expand to multiple modules? Without submodules I mean

lexi.lambda

2017-10-9 22:18:11

apg

2017-10-9 22:19:12

that’d be a limitation of read-syntax?

notjack

2017-10-9 22:19:32

I wonder if there would be any reasonable use for it

notjack

2017-10-9 22:20:00

can’t think of anything that wouldn’t be much more easily done with submodules

apg

2017-10-9 22:20:21

hmm. read-syntax could return #’(begin … )

lexi.lambda

2017-10-9 22:20:24

it’s not a limitation of the reader, no

lexi.lambda

2017-10-9 22:20:40

there’s just the question of what it would mean and what it would do.

notjack

2017-10-9 22:20:51

¯_(ツ)_/¯

apg

2017-10-9 22:21:14

i think the first question is: can the current implementation somehow be tricked into it

lexi.lambda

2017-10-9 22:21:22

modules in racket are uniquely identified by a module path.

apg

2017-10-9 22:21:22

then the second is: is that a good idea or not?

apg

2017-10-9 22:21:42

it seems like it’s a good idea to enforce 1 module. but does the current system do so?

apg

2017-10-9 22:21:57

(i assume yes, based on this discussion)

lexi.lambda

2017-10-9 22:22:09

yes. it won’t work if you expand to something other than a module form.

notjack

2017-10-9 22:22:15

I think the reason readers have to make a module whose name doesn’t matter is related to backwards compatibility with load and the top level

apg

2017-10-9 22:23:36

but it does matter, no? it’d mean I couldn’t technically have two of my slideshows be required by another file?

apg

2017-10-9 22:23:44

(or, another module)

apg

2017-10-9 22:23:57

assuming the things defined in that module are provided, of course?

apg

2017-10-9 22:24:44

maybe doesn’t understand modules enough…

notjack

2017-10-9 22:25:01

The module name doesn’t come up when you require it as a file

notjack

2017-10-9 22:25:29

I’m not sure if the name is important if you load the file into the top level

apg

2017-10-9 22:26:01

“After evaluation is triggered once, later requires do not re-evaluate the module body.” — https://docs.racket-lang.org/guide/Module_Syntax.html#%28part._module-syntax%29

apg

2017-10-9 22:26:14

presumably it uses the name-id to know to not reevaluate the body?

apg

2017-10-9 22:26:37

e.g. if I have foo.rkt -> resulting in (module foo …) and foo2.rkt -> resulting in (module foo …) — will that work?

notjack

2017-10-9 22:27:03

Yes because the path is part of the key used to id modules

notjack

2017-10-9 22:27:16

File path, I mean

apg

2017-10-9 22:27:26

ah. ok

notjack

2017-10-9 22:28:58

the module-path? and resolved-module-path? predicates are good places to go in the docs for more info

apg

2017-10-9 22:29:17

confirmed that this works via a quick experiment. :slightly_smiling_face:

apg

2017-10-9 22:30:01

(and also confirmed that slideshow simple slides can just be included in your #lang slideshow without issues, which is fun)

apg

2017-10-9 22:30:12

(well, required that is)

notjack

2017-10-9 22:30:17

Sweeeet

notjack

2017-10-9 22:30:26

I liked your talk a lot by the way

apg

2017-10-9 22:30:30

thanks!

apg

2017-10-9 22:30:55

i wish i was able to be there yesterday

apg

2017-10-9 22:30:56

:disappointed:

apg

2017-10-9 22:31:11

(sounded like a lot of fun was had by all)

notjack

2017-10-9 22:32:09

Designing tools with the goal to make something you could use at the last minute to whip something up in an hour leads tools in interesting directions

apg

2017-10-9 22:32:42

ha!

notjack

2017-10-9 22:32:45

Hopefully there’s another office hours day next year :p it was great

apg

2017-10-9 22:33:59

if @stamourv and the con team send out a survey, I’m sure “office hours next year” will be part of the feedback.

apg

2017-10-9 22:35:28

I’m just worried about the 11th con. There’s not an eleventh in racket/list — how will that work?

notjack

2017-10-9 22:39:25

PRs welcome ;)

isabellating

2017-10-10 04:34:10

@isabellating has joined the channel