
@greg For the town-hall meeting, it was status reports and discussion – no decisions. The Racket-on-Chez report was that DrRacket now sort of runs, but it will probably take another year for things to run well. The pkgs report (Jay) was about plans to merge the doc and main-package views. The Typed Racket report (Sam) was improved refinement types. The typesetting report (MB) was that a pure Racket solution to generating PDFs is on the way, such as direct Scribble-to-PDF without Latex. I think I’ve missed one or two, so I hope others fill in. After that, we had tutorials on how to contribute, as listed on the schedule. Finally, people broke up into groups and work. I had to leave mid-afternoon to catch my flight, but it looked to me like a lot of good discussion taking place around busy keyboards.

Pkgs/docs: does that mean it wouldn’t link out to http://docs.racket-lang.org\|docs.racket-lang.org? If so, would docs hosting on docs. go away?

(I’m sure this was answered, so apologies for even asking)

@apg The idea is to continue hosting docs, as now, but have the front page of docs and/or pkgs be a general front-end to “finding Racket code” rather than just a documentation page or just a “find a package” page

Exactly what that page would look like remains to be seen, but it might be something like a cross between the current http://docs.r-l.org\|docs.r-l.org, the package search bar, and the “most common packages” that you see on the NPM front page

Ohh!! Ok. Thats great!

Thanks for the details @samth

@kathryngray has joined the channel


Hi, is there a racket function or library that’s faster at splitting a string into a list of strings than string-split? (I’m processing a 2.5Mb text file, takes 2 seconds to read into a list of lines and takes 19 seconds to map string-split over those lines… it’s even worse with my 150+Mb files)

@kathryngray can you do it without reading the full list of lines first?

Each line is independent except order matters

so, use something like for/list
with in-lines

and then split-string on the generated line

you’ll at least save the 2 seconds.

(in-lines will lazily read during iteration instead of all up front)

the string-split then is the problem.

The 2 seconds is trivial :slightly_smiling_face:

(but in the 150Mb case, it’s longer than 2seconds I’m guessing?)

Yeah but it’s still less than 10

how big are the lines?

Whereas the string splitting is still quite large

They vary but on average 8 words

are you splitting with regexp? or just a string?

I’m splitting with the default value to string-split

(which matches ’ ’ but I don’t know if that’s done with a string or a regexp internall)

default is #px"\\s+"

you can probably get better performance by reading and splitting bytes instead of strings

@kathryngray I’m trying to get a big enough file to test if I suffer the same problem.

one second

Cool :slightly_smiling_face: Sadly I have to leave the computer but I’ll check back

so, i’m doing a 5M file (the bible from project gutenberg) in 2.235s of real time

114567 lines of varying sizes

and that includes the printing of the result.

(and less than a second if i suppress output)

anyway, @kathryngray I don’t doubt that you’re seeing this, but I’m guessing there’s another explanation. Can you run http://pasterack.org/pastes/55271 this with your file and see what you get?

@kathryngray assuming you don’t need to worry about non-ASCII whitespace, the following runs in about 2.5 seconds on a 230MB file on my machine (not printing the result, of course):

oh, I didn’t break it up into lines first; not sure exactly what output you’re looking for — a list of lists of strings?

This one splits strings per line. There are more optimizations possible, but this is reasonably fast already, despite all the reversing of lists going on.

@mflatt Is the text% editor implemented in Racket or does it use native text boxes (with a lot of Racket code on top)?

The text%
class along with snips, implements all drawing and manipulation at the level of draw-text
, on-char
, and on-event
.

So, no native text boxes.

@jaz Thanks for the example, sadly it turns out my text input is not utf–8 (was news to me, I thought it was)

you can use other encodings

there’s a bytes->string/latin-1

and if you need, say, WINDOWS–1252, you do that with bytes-open-converter
, etc.

@mflatt Okay cool, that’s what I thought.

thanks. :slightly_smiling_face:

Cut it to 6s on my 2.5Mb, still running the 150Mb

@kathryngray could you maybe share the relevant part of your program?

@kathryngray are you running in DrRacket?

Well the relevant bit of program is now the bit @jaz shared, and yes I’m running in DrRacket as I want to explore the data after I process it

@kathryngray ok.

Did you turn off debugging/profiling?

Because that imposes a heavy performance penalty.

The options are in Language -> Choose Language, and you may need to “show details” if you haven’t in the past.

hmm. maybe this is a hardware thing? @kathryngray are you running this on a relatively recent machine? Or running in virtualization or something?

(even with debugging, @jaz, it still runs < 3 seconds for me)

it could also be printing results, which is far more expensive than computing them

true. printing in drracket is extremely slow.

knew I’d forget something on the trip to RacketCon this year; I forgot to get a shirt!

Oops

@apg I didn’t actually try the dead-simple version first; I had just assumed it was slower. Turns out it’s faster.

@jaz yay for doing the simple thing. :slightly_smiling_face:

I will say, one thing that I do happen to like about go is that they have equivalent packages that are very similar for functions to work on strings and []byte. Might be worth copying this if we can provide faster bytes-split for cases where that’s needed.

@jaz and @apg This is running on a this-year model macbook pro, so hardware shouldn’t be much of an issue, and the only thing printing out during this phase is the time measurement (which means it is measuring time as well as working but doesn’t print out any of the file contents), but I haven’t changed any DrRacket settings other than to beef up the RAM to 4Gb (I’m processing the files one after another, so my memory footprint does get big by the second 150+Mb file) so I’ll try that tomorrow. Still 20s -> 6s helps me get to the more interesting exploration faster :slightly_smiling_face: So Thanks a bunch for the help

ooh! It’s multiple files in a row?

Also printing in DrRacket is very slow and doesn’t seem to be interruptible if the processing has finished (I accidentally let the file contents enter the interactions window early on, and it seemed the only action to take to stop it was to force quit DrRacket)

Four files in a row, shortest to biggest

ah. ok

In that window, in another two files in a row. The initial reading time doesn’t seem to lag for multiple files although memory builds up

@kathryngray (time (void (do-the-thing))
can help a lot… but hit Cmd-L, “Show Details” and switch from “Debugging” to “No debugging or profiling”. It makes a huge difference

Oddly I just ran it without debugging and it was faster… there’s a lot of GC time. Trying a fresh run of DrRacket

I mean it was faster With debugging turned on

maybe it has a compile artifact w/ the debugging and it needs to recompile for non? I don’t know how that side works

i’d imagine the run button basically reloads the entire code window into the repl, no?

OK. I have a basic #lang lexer
working… but one thing that confuses me is what functions are available… if my main.rkt implementing my hashlang is racket
then things work fine… my lexer can use things like string-trim
… but if I make it use racket/base
then string-trim
isn’t available EVEN IF I put in a (require racket/string)
inside my #%module-begin
.
What am I not getting?

#lang oedipuslex
digits = (:+ numeric)
(eof) : (return-without-srcloc eof)
"\n" : (token 'NEWLINE lexeme)
whitespace : (token lexeme #:skip? #t)
(:or "print" "goto" "end" "+" ":" ";") : (token lexeme lexeme)
digits : (token 'INTEGER (string->number lexeme))
(:or (:seq (:? digits) "." digits) (:seq digits ".")) : (token 'DECIMAL (string->number lexeme))
(:or (from/to "\"" "\"") (from/to "'" "'")) : (token 'STRING (string-trim lexeme #px"."))
(so far…. I’d like to add more constructs)

possible phase 3:
;; TODO: phase 3: maybe?
digits = (:+ numeric)
(eof) : (return-without-srcloc eof)
"\n" : (token 'NEWLINE lexeme)
whitespace : (token lexeme #:skip? #t)
"rem" ... "\n" : (token 'REM lexeme)
"print" \| "goto" \| "end" \| "+" \| ":" \| ";" : (token lexeme lexeme)
digits : (token 'INTEGER (string->number lexeme))
digits? "." digits \| digits "." : (token 'DECIMAL (string->number lexeme))
"\"" .. "\"" \| "'" .. "'" : (token 'STRING (string-trim lexeme #px"."))

@zenspider If your #%module-begin
introduces the (require racket/string)
, then the introduced bindings are not visible to references that are supplied to the macro. In other words, imports are hygienic the same as definitions. It’s usually easiest to have the oedipuslex
reader to inject a require
that has empty lexical context, so that it behaves as if it were present with the references.

To add to what Matthew said, require
introduces bindings with the same lexical context as the piece of syntax for the module name itself.

I think it would be easier if you had your #lang lexer expand into (module anonymous lexer body …)

That way whatever (require lexer) would export is what will be visible in a #lang lexer module

But I haven’t done this in a while and my memory is fuzzy

aside: is there a reason modules can’t be, actually anonymous ?

slideshow/simple creates a module my-module
which seems silly

that part I’m fuzzy on

modules are not first-class, so they can’t be anonymous in that sense

that’s explanation enough.

:slightly_smiling_face:

can a lang expand to multiple modules? Without submodules I mean

no

that’d be a limitation of read-syntax?

I wonder if there would be any reasonable use for it

can’t think of anything that wouldn’t be much more easily done with submodules

hmm. read-syntax could return #’(begin … )

it’s not a limitation of the reader, no

there’s just the question of what it would mean and what it would do.

¯_(ツ)_/¯

i think the first question is: can the current implementation somehow be tricked into it

modules in racket are uniquely identified by a module path.

then the second is: is that a good idea or not?

it seems like it’s a good idea to enforce 1 module. but does the current system do so?

(i assume yes, based on this discussion)

yes. it won’t work if you expand to something other than a module
form.

I think the reason readers have to make a module whose name doesn’t matter is related to backwards compatibility with load
and the top level

but it does matter, no? it’d mean I couldn’t technically have two of my slideshows be required by another file?

(or, another module)

assuming the things defined in that module are provided, of course?

maybe doesn’t understand modules enough…

The module name doesn’t come up when you require it as a file

I’m not sure if the name is important if you load the file into the top level

“After evaluation is triggered once, later requires do not re-evaluate the module body.” — https://docs.racket-lang.org/guide/Module_Syntax.html#%28part._module-syntax%29

presumably it uses the name-id to know to not reevaluate the body?

e.g. if I have foo.rkt -> resulting in (module foo …) and foo2.rkt -> resulting in (module foo …) — will that work?

Yes because the path is part of the key used to id modules

File path, I mean

ah. ok

the module-path? and resolved-module-path? predicates are good places to go in the docs for more info

confirmed that this works via a quick experiment. :slightly_smiling_face:

(and also confirmed that slideshow simple slides can just be included in your #lang slideshow without issues, which is fun)

(well, required that is)

Sweeeet

I liked your talk a lot by the way

thanks!

i wish i was able to be there yesterday

:disappointed:

(sounded like a lot of fun was had by all)

Designing tools with the goal to make something you could use at the last minute to whip something up in an hour leads tools in interesting directions

ha!

Hopefully there’s another office hours day next year :p it was great

if @stamourv and the con team send out a survey, I’m sure “office hours next year” will be part of the feedback.

I’m just worried about the 11th con. There’s not an eleventh
in racket/list
— how will that work?

PRs welcome ;)

@isabellating has joined the channel