mflatt
2017-10-9 14:45:19

@greg For the town-hall meeting, it was status reports and discussion – no decisions. The Racket-on-Chez report was that DrRacket now sort of runs, but it will probably take another year for things to run well. The pkgs report (Jay) was about plans to merge the doc and main-package views. The Typed Racket report (Sam) was improved refinement types. The typesetting report (MB) was that a pure Racket solution to generating PDFs is on the way, such as direct Scribble-to-PDF without Latex. I think I’ve missed one or two, so I hope others fill in. After that, we had tutorials on how to contribute, as listed on the schedule. Finally, people broke up into groups and work. I had to leave mid-afternoon to catch my flight, but it looked to me like a lot of good discussion taking place around busy keyboards.


apg
2017-10-9 14:52:02

Pkgs/docs: does that mean it wouldn’t link out to http://docs.racket-lang.org\|docs.racket-lang.org? If so, would docs hosting on docs. go away?


apg
2017-10-9 14:52:33

(I’m sure this was answered, so apologies for even asking)


samth
2017-10-9 14:53:37

@apg The idea is to continue hosting docs, as now, but have the front page of docs and/or pkgs be a general front-end to “finding Racket code” rather than just a documentation page or just a “find a package” page


samth
2017-10-9 14:56:26

Exactly what that page would look like remains to be seen, but it might be something like a cross between the current http://docs.r-l.org\|docs.r-l.org, the package search bar, and the “most common packages” that you see on the NPM front page


apg
2017-10-9 15:13:32

Ohh!! Ok. Thats great!


apg
2017-10-9 15:13:52

Thanks for the details @samth


kathryngray
2017-10-9 16:19:01

@kathryngray has joined the channel


apg
2017-10-9 16:30:20

the #irc channel is out of date again.


kathryngray
2017-10-9 16:30:30

Hi, is there a racket function or library that’s faster at splitting a string into a list of strings than string-split? (I’m processing a 2.5Mb text file, takes 2 seconds to read into a list of lines and takes 19 seconds to map string-split over those lines… it’s even worse with my 150+Mb files)


apg
2017-10-9 16:31:23

@kathryngray can you do it without reading the full list of lines first?


kathryngray
2017-10-9 16:31:59

Each line is independent except order matters


apg
2017-10-9 16:32:40

so, use something like for/list with in-lines


apg
2017-10-9 16:32:55

and then split-string on the generated line


apg
2017-10-9 16:33:08

you’ll at least save the 2 seconds.


apg
2017-10-9 16:33:24

(in-lines will lazily read during iteration instead of all up front)


apg
2017-10-9 16:33:37

the string-split then is the problem.


kathryngray
2017-10-9 16:34:14

The 2 seconds is trivial :slightly_smiling_face:


apg
2017-10-9 16:34:24

(but in the 150Mb case, it’s longer than 2seconds I’m guessing?)


kathryngray
2017-10-9 16:34:34

Yeah but it’s still less than 10


apg
2017-10-9 16:34:51

how big are the lines?


kathryngray
2017-10-9 16:34:51

Whereas the string splitting is still quite large


kathryngray
2017-10-9 16:35:22

They vary but on average 8 words


apg
2017-10-9 16:36:00

are you splitting with regexp? or just a string?


kathryngray
2017-10-9 16:36:22

I’m splitting with the default value to string-split


kathryngray
2017-10-9 16:37:06

(which matches ’ ’ but I don’t know if that’s done with a string or a regexp internall)


jaz
2017-10-9 16:37:24

default is #px"\\s+"


jaz
2017-10-9 16:39:52

you can probably get better performance by reading and splitting bytes instead of strings


apg
2017-10-9 16:40:22

@kathryngray I’m trying to get a big enough file to test if I suffer the same problem.


apg
2017-10-9 16:40:23

one second


kathryngray
2017-10-9 16:41:40

Cool :slightly_smiling_face: Sadly I have to leave the computer but I’ll check back


apg
2017-10-9 16:42:43

so, i’m doing a 5M file (the bible from project gutenberg) in 2.235s of real time


apg
2017-10-9 16:42:57

114567 lines of varying sizes


apg
2017-10-9 16:43:11

and that includes the printing of the result.


apg
2017-10-9 16:46:16

(and less than a second if i suppress output)


apg
2017-10-9 16:47:54

anyway, @kathryngray I don’t doubt that you’re seeing this, but I’m guessing there’s another explanation. Can you run http://pasterack.org/pastes/55271 this with your file and see what you get?


jaz
2017-10-9 17:05:49

@kathryngray assuming you don’t need to worry about non-ASCII whitespace, the following runs in about 2.5 seconds on a 230MB file on my machine (not printing the result, of course):


jaz
2017-10-9 17:09:51

oh, I didn’t break it up into lines first; not sure exactly what output you’re looking for — a list of lists of strings?


jaz
2017-10-9 17:43:55

This one splits strings per line. There are more optimizations possible, but this is reasonably fast already, despite all the reversing of lists going on.


leif
2017-10-9 17:52:40

@mflatt Is the text% editor implemented in Racket or does it use native text boxes (with a lot of Racket code on top)?


mflatt
2017-10-9 17:54:42

The text% class along with snips, implements all drawing and manipulation at the level of draw-text, on-char, and on-event.


mflatt
2017-10-9 17:54:53

So, no native text boxes.


kathryngray
2017-10-9 18:02:06

@jaz Thanks for the example, sadly it turns out my text input is not utf–8 (was news to me, I thought it was)


jaz
2017-10-9 18:02:58

you can use other encodings


jaz
2017-10-9 18:03:15

there’s a bytes->string/latin-1


jaz
2017-10-9 18:04:00

and if you need, say, WINDOWS–1252, you do that with bytes-open-converter, etc.


leif
2017-10-9 18:05:19

@mflatt Okay cool, that’s what I thought.


leif
2017-10-9 18:05:20

thanks. :slightly_smiling_face:


kathryngray
2017-10-9 18:08:14

Cut it to 6s on my 2.5Mb, still running the 150Mb


apg
2017-10-9 18:08:47

@kathryngray could you maybe share the relevant part of your program?


jaz
2017-10-9 18:08:50

@kathryngray are you running in DrRacket?


kathryngray
2017-10-9 18:09:38

Well the relevant bit of program is now the bit @jaz shared, and yes I’m running in DrRacket as I want to explore the data after I process it


apg
2017-10-9 18:10:07

@kathryngray ok.


jaz
2017-10-9 18:10:10

Did you turn off debugging/profiling?


jaz
2017-10-9 18:10:21

Because that imposes a heavy performance penalty.


jaz
2017-10-9 18:12:02

The options are in Language -> Choose Language, and you may need to “show details” if you haven’t in the past.


apg
2017-10-9 18:14:09

hmm. maybe this is a hardware thing? @kathryngray are you running this on a relatively recent machine? Or running in virtualization or something?


apg
2017-10-9 18:14:50

(even with debugging, @jaz, it still runs < 3 seconds for me)


jaz
2017-10-9 18:15:04

it could also be printing results, which is far more expensive than computing them


apg
2017-10-9 18:16:14

true. printing in drracket is extremely slow.


notjack
2017-10-9 19:10:56

knew I’d forget something on the trip to RacketCon this year; I forgot to get a shirt!


notjack
2017-10-9 19:11:18

Oops


jaz
2017-10-9 19:14:30

@apg I didn’t actually try the dead-simple version first; I had just assumed it was slower. Turns out it’s faster.


apg
2017-10-9 19:22:46

@jaz yay for doing the simple thing. :slightly_smiling_face:


apg
2017-10-9 19:25:55

I will say, one thing that I do happen to like about go is that they have equivalent packages that are very similar for functions to work on strings and []byte. Might be worth copying this if we can provide faster bytes-split for cases where that’s needed.


kathryngray
2017-10-9 20:31:13

@jaz and @apg This is running on a this-year model macbook pro, so hardware shouldn’t be much of an issue, and the only thing printing out during this phase is the time measurement (which means it is measuring time as well as working but doesn’t print out any of the file contents), but I haven’t changed any DrRacket settings other than to beef up the RAM to 4Gb (I’m processing the files one after another, so my memory footprint does get big by the second 150+Mb file) so I’ll try that tomorrow. Still 20s -> 6s helps me get to the more interesting exploration faster :slightly_smiling_face: So Thanks a bunch for the help


apg
2017-10-9 20:32:27

ooh! It’s multiple files in a row?


kathryngray
2017-10-9 20:32:39

Also printing in DrRacket is very slow and doesn’t seem to be interruptible if the processing has finished (I accidentally let the file contents enter the interactions window early on, and it seemed the only action to take to stop it was to force quit DrRacket)


kathryngray
2017-10-9 20:32:45

Four files in a row, shortest to biggest


apg
2017-10-9 20:33:16

ah. ok


kathryngray
2017-10-9 20:33:21

In that window, in another two files in a row. The initial reading time doesn’t seem to lag for multiple files although memory builds up


zenspider
2017-10-9 20:42:17

@kathryngray (time (void (do-the-thing)) can help a lot… but hit Cmd-L, “Show Details” and switch from “Debugging” to “No debugging or profiling”. It makes a huge difference


kathryngray
2017-10-9 20:43:14

Oddly I just ran it without debugging and it was faster… there’s a lot of GC time. Trying a fresh run of DrRacket


kathryngray
2017-10-9 20:46:54

I mean it was faster With debugging turned on


zenspider
2017-10-9 20:49:39

maybe it has a compile artifact w/ the debugging and it needs to recompile for non? I don’t know how that side works


apg
2017-10-9 20:50:23

i’d imagine the run button basically reloads the entire code window into the repl, no?


zenspider
2017-10-9 21:43:40

OK. I have a basic #lang lexer working… but one thing that confuses me is what functions are available… if my main.rkt implementing my hashlang is racket then things work fine… my lexer can use things like string-trim… but if I make it use racket/base then string-trim isn’t available EVEN IF I put in a (require racket/string) inside my #%module-begin.

What am I not getting?


zenspider
2017-10-9 21:46:07
#lang oedipuslex

digits = (:+ numeric)

(eof)                                                 : (return-without-srcloc eof)
"\n"                                                  : (token 'NEWLINE lexeme)
whitespace                                            : (token lexeme #:skip? #t)
(:or "print" "goto" "end" "+" ":" ";")                : (token lexeme lexeme)
digits                                                : (token 'INTEGER (string-&gt;number lexeme))
(:or (:seq (:? digits) "." digits) (:seq digits ".")) : (token 'DECIMAL (string-&gt;number lexeme))
(:or (from/to "\"" "\"") (from/to "'"  "'"))          : (token 'STRING (string-trim lexeme #px"."))

(so far…. I’d like to add more constructs)


zenspider
2017-10-9 21:47:27

possible phase 3:

;; TODO: phase 3: maybe?
digits = (:+ numeric)

(eof)                                      : (return-without-srcloc eof)
"\n"                                       : (token 'NEWLINE lexeme)
whitespace                                 : (token lexeme #:skip? #t)
"rem" ... "\n"                             : (token 'REM lexeme)
"print" \| "goto" \| "end" \| "+" \| ":" \| ";" : (token lexeme lexeme)
digits                                     : (token 'INTEGER (string-&gt;number lexeme))
digits? "." digits \| digits "."            : (token 'DECIMAL (string-&gt;number lexeme))
 "\"" .. "\"" \| "'" .. "'"                 : (token 'STRING (string-trim lexeme #px"."))

mflatt
2017-10-9 22:13:25

@zenspider If your #%module-begin introduces the (require racket/string), then the introduced bindings are not visible to references that are supplied to the macro. In other words, imports are hygienic the same as definitions. It’s usually easiest to have the oedipuslex reader to inject a require that has empty lexical context, so that it behaves as if it were present with the references.


lexi.lambda
2017-10-9 22:14:43

To add to what Matthew said, require introduces bindings with the same lexical context as the piece of syntax for the module name itself.


notjack
2017-10-9 22:15:59

I think it would be easier if you had your #lang lexer expand into (module anonymous lexer body …)


notjack
2017-10-9 22:16:40

That way whatever (require lexer) would export is what will be visible in a #lang lexer module


notjack
2017-10-9 22:16:51

But I haven’t done this in a while and my memory is fuzzy


apg
2017-10-9 22:17:02

aside: is there a reason modules can’t be, actually anonymous ?


apg
2017-10-9 22:17:16

slideshow/simple creates a module my-module which seems silly


notjack
2017-10-9 22:17:26

that part I’m fuzzy on


lexi.lambda
2017-10-9 22:17:31

modules are not first-class, so they can’t be anonymous in that sense


apg
2017-10-9 22:17:59

that’s explanation enough.


apg
2017-10-9 22:18:01

:slightly_smiling_face:


notjack
2017-10-9 22:18:05

can a lang expand to multiple modules? Without submodules I mean


lexi.lambda
2017-10-9 22:18:11

no


apg
2017-10-9 22:19:12

that’d be a limitation of read-syntax?


notjack
2017-10-9 22:19:32

I wonder if there would be any reasonable use for it


notjack
2017-10-9 22:20:00

can’t think of anything that wouldn’t be much more easily done with submodules


apg
2017-10-9 22:20:21

hmm. read-syntax could return #’(begin … )


lexi.lambda
2017-10-9 22:20:24

it’s not a limitation of the reader, no


lexi.lambda
2017-10-9 22:20:40

there’s just the question of what it would mean and what it would do.


notjack
2017-10-9 22:20:51

¯_(ツ)_/¯


apg
2017-10-9 22:21:14

i think the first question is: can the current implementation somehow be tricked into it


lexi.lambda
2017-10-9 22:21:22

modules in racket are uniquely identified by a module path.


apg
2017-10-9 22:21:22

then the second is: is that a good idea or not?


apg
2017-10-9 22:21:42

it seems like it’s a good idea to enforce 1 module. but does the current system do so?


apg
2017-10-9 22:21:57

(i assume yes, based on this discussion)


lexi.lambda
2017-10-9 22:22:09

yes. it won’t work if you expand to something other than a module form.


notjack
2017-10-9 22:22:15

I think the reason readers have to make a module whose name doesn’t matter is related to backwards compatibility with load and the top level


apg
2017-10-9 22:23:36

but it does matter, no? it’d mean I couldn’t technically have two of my slideshows be required by another file?


apg
2017-10-9 22:23:44

(or, another module)


apg
2017-10-9 22:23:57

assuming the things defined in that module are provided, of course?


apg
2017-10-9 22:24:44

maybe doesn’t understand modules enough…


notjack
2017-10-9 22:25:01

The module name doesn’t come up when you require it as a file


notjack
2017-10-9 22:25:29

I’m not sure if the name is important if you load the file into the top level


apg
2017-10-9 22:26:01

“After evaluation is triggered once, later requires do not re-evaluate the module body.” — https://docs.racket-lang.org/guide/Module_Syntax.html#%28part._module-syntax%29


apg
2017-10-9 22:26:14

presumably it uses the name-id to know to not reevaluate the body?


apg
2017-10-9 22:26:37

e.g. if I have foo.rkt -> resulting in (module foo …) and foo2.rkt -> resulting in (module foo …) — will that work?


notjack
2017-10-9 22:27:03

Yes because the path is part of the key used to id modules


notjack
2017-10-9 22:27:16

File path, I mean


apg
2017-10-9 22:27:26

ah. ok


notjack
2017-10-9 22:28:58

the module-path? and resolved-module-path? predicates are good places to go in the docs for more info


apg
2017-10-9 22:29:17

confirmed that this works via a quick experiment. :slightly_smiling_face:


apg
2017-10-9 22:30:01

(and also confirmed that slideshow simple slides can just be included in your #lang slideshow without issues, which is fun)


apg
2017-10-9 22:30:12

(well, required that is)


notjack
2017-10-9 22:30:17

Sweeeet


notjack
2017-10-9 22:30:26

I liked your talk a lot by the way


apg
2017-10-9 22:30:30

thanks!


apg
2017-10-9 22:30:55

i wish i was able to be there yesterday


apg
2017-10-9 22:30:56

:disappointed:


apg
2017-10-9 22:31:11

(sounded like a lot of fun was had by all)


notjack
2017-10-9 22:32:09

Designing tools with the goal to make something you could use at the last minute to whip something up in an hour leads tools in interesting directions


apg
2017-10-9 22:32:42

ha!


notjack
2017-10-9 22:32:45

Hopefully there’s another office hours day next year :p it was great


apg
2017-10-9 22:33:59

if @stamourv and the con team send out a survey, I’m sure “office hours next year” will be part of the feedback.


apg
2017-10-9 22:35:28

I’m just worried about the 11th con. There’s not an eleventh in racket/list — how will that work?


notjack
2017-10-9 22:39:25

PRs welcome ;)


isabellating
2017-10-10 04:34:10

@isabellating has joined the channel