
Hi all, I’m cross-posting from the Racket channel at freenode. This is I’m afraid sort of off-topic, so please tell and excuse me if I am out of place. I am following along @popa.bogdanp’s great screencast tutorial: https://www.youtube.com/watch?v=DS_0-lqiSVs and was wondering if someone could help me to set an environment up with Docker Compose or Podman (I am actually using the Podman compatibility layer) using koyo-shorty as an example app https://github.com/Bogdanp/koyo-shorty. So first question: 1) Should I build my own (multi-stage?) container images for postgres and Racket/Koyo, or shoud I pull a ready-made availiable container image? I am quite new to containerization and I have to actually learn this DevOpsy stuff. The goal of my excercise is to build a fictional staging-like environment that will be subjected to testing. I’ve read https://defn.io/2020/06/28/racket-deployment/ and found it interesting, and might try that in the future but this time I am being required to at least use Docker/Podman and a CM tool like Ansible or Chef and some testing framework like serverspec or similar. I am finding it too much for my brain to tackle at once. So, a cursory look at http://hub.docker.com\|hub.docker.com brought racket/racket and racket/racket-ci to my attention, which one should I use? or none of those?

I have a custom language (magic, https://github.com/jjsimpso/magic) that compiles very slowly. Short files compile/expand fairly quickly, but a 1000 line file takes 30 seconds or so and a 3000 line file takes upwards of 30 minutes. Any tips on profiling this? DrRacket’s macro expander doesn’t seem to like the larger files. When I click ’Macro Stepper" on the 1000 line file it doesn’t do anything.

I’m using brag to implement the parser and reader. One thought is that the parser is slow, but it could also be my macros for expanding magic.

It looks like the macro stepper did eventually throw an error:

file-position: setting position allowed for file-stream and string ports only
port: #<input-port:/home/jonathan/git/magic/tests/images.rkt>
position: 449
#<void>: 51:22
/home/jonathan/.racket/snapshot/pkgs/brag-lib/brag/codegen/runtime.rkt: 69:2

The code runs correctly outside of the macro stepper though, so I’m not sure what the problem is.

Well, that sounds as good point of attack. Time the parsing part only of files with different sizes.
It sounds as if something makes the time grow quadratic in the number of lines. One possibility is, that there is a loop (or matching process) of all lines, which contains a sub loop or sum match over the remaining lines. If the problem is in the macro layer, then look out for … patterns.

Any hints or sources of info on how I can actually profile this? Will I need to instrument the code? Profiling in racket-mode when I run the file doesn’t appear to profile the macro expansion or parser.

If the problem is in the parser, look out for any right recursive rules.

For profiling: You can temporarily, make your module-begin
expand to #’42. Then use time raco compile file.rkt
to see how long it takes to parse the file.

I like the module-begin idea. I’ll try that.

Is the macro profiler of any help? This one: https://docs.racket-lang.org/macro-debugger/index.html#%28part._.Macro_.Profiler%29

With the simple module-begin, it does appear that almost all the time is spent in the parser.

That means, we need to look at this file?

Btw - I don’t know if it is useful in this situation, but the parses have a debug clause, that can be used to save extra information.


Parsing isn’t my area of expertise, but I don’t think any of my rules are recursive. The only rules that expand to anything complicated are at the top of the grammar: magic : EOL* (query \| named-query)+
query : line (level+ (line \| clear-line))* /EOL*
level : /">"
line : offset /HWS type /HWS test (/HWS message?)? /EOL*
clear-line : offset /HWS "clear" (/HWS test)? /EOL*
named-query : name-line (level+ (line \| clear-line))* /EOL*
name-line : offset /HWS name-type /HWS MAGIC-NAME (/HWS message?)? /EOL*

Basically, a magic file is a sequence of lines and lines expand to their component parts and aren’t recursive.

@hoshom good to know about the macro profiler even if it seems the macro expansion may not be the problem here.

I mostly have experience with using the parser-tools/parser
directly. I have used ragg
once, but never brag
.
Although your brag
grammar doesn’t contain any recursive rules (as far as I can tell from the snippet above), it is worth looking at the grammar that brag
produces for parser-tools
. Maybe the translation is not one to one (for example how are cuts handled )?

I was curious about possible issues with the EOL*
’s colliding, but I imagine your long files don’t have lots of blank lines? And a short file with many blank lines is still fast?

I’ll do some tests with blank lines. I’ll also see if I can figure out how to look at the grammar brag is producing.

Wait - where is HWS defined? It looks like it is commented out?

Ah - it’s a token.

It is a token.

Yep :slightly_smiling_face:

I have very little hands-on mileage with parsing, and none with brag and that grammar. But from my experience making a markdown parser that at times was horribly slow: :smile: I might look first at choices? Like the topmost choice between query \| named-query
? Could it be exploring query
more than it needs to before realizing that fails and backtracking to try named-query
? Stuff like that. Not necessarily quadratic, but I might start there? idk

Also I might have guessed that a named-query
would be a special case of a query
and it would make sense to try that first? Not sure, just thinking out loud.

Like I believe other people already said, I think one obvious “hand-wavy first theory of the bug” is that it’s scanning to the end of the file before failing and backtracking.

Planet Scheme is moving, so I have a redirect in place. Anyone care to test if it actually works?

@soegaard2 This is where the link takes me.

Oh! Wait. I should I have given you the old link … http://scheme.dk/planet\|scheme.dk/planet

Takes me to the same place.

Great.

Same here.

Regarding 1): the technique described in the article you linked is useful even when you use Docker to deploy things. What I generally do is use multi-stage builds to create minimal containers by: building a Racket distribution inside an image built on top of racket/racket
then copying that distribution into a debian image. You can find an example of this process https://github.com/MarcKaufmann/congame/blob/e31e5adf8379de3fd2caac9a94fababf469b5d03/Dockerfile\|here.

Re. racket/racket
vs racket/racket-ci
, you want the former; the latter is used for Racket’s own CI.

Re. pulling images vs making your own, it depends on what you’re comfortable with. When I use Postgres with Docker, I tend to just use the official image from Docker Hub.

Thank you Bogdan, I’ll start with the official Postgres image as you suggest and copy (steal) from your Dockerfile in order to muti-stage build a new koyo-shorty Docker Image. Once I get shorty running, I can then think of writing a docker-compose yaml file to bring up and connect both containers.

@jjsimpso This looks suspicious: https://github.com/jjsimpso/magic/blob/master/reader.rkt#L113 If all strings are small, it might be ok, but otherwise the idiom is to accumulate a list of characters (in reverse order), and then use (string-append* (reverse acumulated))
when the end is reached.

Earlier, I asked why a complex number with 0.0
for the imaginary part is not considered real?
. @sanchom gave <https://racket.slack.com/archives/C06V96CKX/p1610335952466600|the explanation> that the imaginary part must be an exact zero, for a complex number to be real?
. That makes sense, but I do find that unintuitive, in the context of (= 0 0.0)
returning #true
.
After reading the docs, I find it a little confusing that numerical comparisons coerce arguments into exact numbers, which runs counter to the usual rule of numerical procedures propagating inexactness. I’d like to understand why comparisons work that way, what’re the reasons behind it?

I’m glad you noticed that. There was one other place where I used string-append thoughtlessly and generated way too many allocations. The strings should be short but, I’ll rewrite so that it doesn’t call string-append for every loop iteration.

I hope it is this simple, otherwise I’ll need to look further into the grammar.

=
is used to refer to a semantic comparison. This includes IEEE754 floating point special equality (such as all zeros are equal and no nans are equal)

If you want to know if numbers are actually the same you should use eq?

The string-append change did not make a noticeable difference, but nevertheless I think it is a good change. So I appreciate that.

I don’t think I’d describe it as coercing to exact in equality

@greg would the general idea be to put most likely choices first? I’ve very little experience with parsers myself. named-queries in my language are relatively rare but I wonder if there is a large cost whenever one is encountered.

Floating point numbers denote particular rationals

And = compares rationals

actually, maybe they aren’t as rare as i think. the queries themselves can also be quite a few lines, so i agree that this is a good place to start.

This is from the docs on =
> An inexact number is numerically equal to an exact number when the exact coercion of the inexact number is the exact number.

Sure, but for all numbers which are =
, the same is true in the reverse direction

Should I use cs-full as well?

Coercion to inexact is lossy though, so you can’t specify the function that way

ah

Sorry, floats are confusing sometimes.

that they are

i wrote a library for parsing floats a while ago, so i learned a lot about them.

probably more than i will ever need

As @ben.knoble suggested, there does appear to be a problem with parsing the EOL token. a file with a small query followed by 500 blank lines takes 4 minutes to compile. But 500 blank lines before the first query are parsed quickly, the magic rule can probably throw them away quickly.

Removing the redundant EOL* rule at the end of my query rule speeds up my blank line test file but doesn’t totally fix the problem. This does appear to be the correct path. I can probably tweak the rules to fix this.

I’ve made a significant improvement by tweaking the EOL parsing. I’m not sure that I’m all the way there since ideally I’d like to parse files with tens of thousands of lines quickly, but this is a huge improvement. Thanks everyone for your help! I’ll probably post my commit here later in case anyone is curious.

I wonder whether there are online tools that can analyze such parser rules, and give some advice on what to avoid?

That would be very helpful. I’m just happy that I don’t have to abandon brag. I was afraid I’d need to rewrite the parser just using parser-tools. By adjusting the rules I’ve reduced the large file’s comp time from 30 min to 1min.
Next, I think I will modify the lexer to collapse consecutive EOL tokens into one. That will enable me to simplify the grammar further and hopefully get things to a reasonable speed.

Yes, otherwise your build will spend a lot of time downloading dependencies.

I’ve made a lot of progress on my refactoring tool, which I’m calling resyntax
. I’m looking for ideas of refactoring rules to implement so I can further test the tool. Anyone interested (especially @sorawee, @laurent.orseau, @rokitna, @soegaard2, and @kellysmith12.21) is invited to share their ideas in https://github.com/jackfirth/resyntax/issues/8

How about one that rewrites define-struct
into struct
?

already implemented :)

Where are the rules?

Could look for all mentions of deprecated in the docs?


I am blind, they are in … yes exactly.

(cons a (cons b xs)) -> (list* a b xs) MIght be controversial?

(cons a (list b c …)) -> (list a b c …)

(or a) -> a

@notjack may be hard to do but this would be awesome: make it possible to apply refactoring rules to rhs of refactoring rules. Then you wouldn’t have to worry about proposing rules that rely on other rules, since you can force them (in some cases) to be self contained.

(cond [test expr] [else expr])
-> (if test expr expr)
but may also be a matter of taste?

If the first clause is short, it’s possible to write it on two lines. Otherwise I prefer the if.

Nested ifs to cond

let () to block (needs to add a require though)

If+begin to cond

How about rules converting a program in a teaching language into one standard Racket? Im thinking removal of local
.

define foo lambda to define (foo …)

Apropos, cond and begin. Maybe rules for converting an old-style Scheme program into Racket. For example (cond [expr (let () . body)]) -> (cond [expr . body])

Port rnrs scheme to racket :)

Let loop to for (see also ryanc ideas in the package mentioned by samth)

(+ (+ … )) to (+ …)

That example works, but, say (+ a (+ b c) d) -> (+ a b c d) doesn’t if there are floats involved.

I want to avoid adding rules without evidence they actually occur “in the wild”, so to speak

A reasonable rule.

There is a vulkan api as well (https://docs.racket-lang.org/vulkan/index.html) but I’m not sure how complete it is.

I can write a package with only bad style if you want?

Let’s look at some old code as a concrete example: https://github.com/soegaard/little-helper/blob/master/lexer.rkt

Another use case: conventions. Define your own conventions and refactor inconsistent code to follow them

I notice that case-lambda
is used to handle default arguments. Presumably the code was written before define
supported default arguments.

(if token
(begin
(f token)
(for-each-token f count-lines?))
(error "internal error: token expected after skipping")))))]))

I’d like to avoid defining conventions that are idiosyncratic to a codebase. However, I do want to give #lang
implementations and library modules the ability to define conventions specific to that language or library. Then if someone really wants to make conventions for their codebase, they can define #lang mycodebase
and use that instead of #lang racket
.

Anything with (if e (begin ...) ...)
would be better as a cond
.

(if e (begin (define a 1) a) …)
actually doesn’t compile, because begin
in an expression context can’t include definitions

(if e (let () …) …)
would be a good candidate though

That case-lambda
one is good

Ah, the example were: (if e (begin e1 e2 e3) (begin e4 e5 e6)) where e1, e2, e4, e5 are expressions with side effects.

Oh yes that’s definitely a good one

A shame (eqv? (peek-char) #\#)
can’t be be rewritten to (char=? (peek-char) #\#)
. (since peek-char
can return a non-char.

How about (lambda . more) to (λ . more) ?

if it were up to me I’d just rewrite all usages of the type-specific equality procedures to equal?
to discourage their use, and leave it up to the optimizer to figure out when that’s appropriate

but, that definitely doesn’t satisfy the non-controversial requirement

Rewriting lambda
to λ
would be good in my opinion, though that one might be controversial to some

Yeah, you are right.

> may be hard to do but this would be awesome: make it possible to apply refactoring rules to rhs of refactoring rules. Then you wouldn’t have to worry about proposing rules that rely on other rules, since you can force them (in some cases) to be self contained. @laurent.orseau this is absolutely possible :) refactoring rules are roughly functions from syntax to syntax, so they’re composable

an early version of resyntax
just repeatedly applied all rules until the code reached a fixpoint

I got rid of that both for performance reasons and because it didn’t handle broken rule-generated code very well

those problems are both fixable but they’re more work and I wanted to just iterate on the basic functions of the tool first

Anyway I have to go do my actual job now. I cordially invite you all to leave comments and/or open issues with the ideas you’ve come up with.

Is it documented somewhere which hashing algorithm Racket uses for its hash tables? We have an external tool we’re trying to achieve parity with Racket’s results. Thanks!

No, it’s not documented, and it sometimes changes. When you say “parity”, you mean that you want to produce the same hash code? Or just something with similar characteristics?

Ideally the same hash code so that for the same data sets we write out our tables in the same order.

Is the implementation on the github repo? That would be enough for our needs.

@notjack two questions: can your library work with *SL languages instead of Racket? And, can it work with snips in the source and not just plain-text? I wonder if this could be part of a linter for HtDP programs… :thinking_face: if this is out-of-scope for you, nvm…

The implementation for BC starts around https://github.com/racket/racket/blob/master/racket/src/bc/src/hash.c#L1407

tyvm!

For CS, it’s partly https://github.com/racket/racket/blob/master/racket/src/cs/rumble/hash-code.ss


and one more for symbols: https://github.com/racket/racket/blob/master/racket/src/ChezScheme/c/intern.c#L107

@blerner it definitely only works for textual programs. As for the student languages, I’m not sure. My intention is that it eventually works for any #lang

Here is my initial fix for this, but I still need to tweak the reader some more: https://github.com/jjsimpso/magic/commit/ea49f62b71a2c105f3723367a7ed39c0bba1a815

@notjack What is this? https://github.com/jackfirth/resyntax/blob/master/source-code.rkt#L71

Magic :slightly_smiling_face:


That explains why my search of Racket docs didn’t find anything. :smile:

This is potentially unsafe. Consider:
(cond
[#t (begin (define x 1) x)]
[else 2])

through the magic of GitHub search I was starting to get an inkling, but thanks for the direct link! :slightly_smiling_face:

It’s very magic

Somehow that stretch of Slack discussion slipped by me, it was only a few days ago. Derp.


I feel like I need to start adding undocumented modules like this to my own packages. #%do-not-use
#%magic
#%sekret-modyule
and so on.

Scribble’s defstruct*
form automagically indexes the predicate and accessor procedures for a struct type, linking them to the blue box where the type is described. Is there a way to do something similar, using a defform
?

I thought the convention is to use private
directory

@ryanc’s paper on the macro stepper is probably useful for understanding it

Good point. So (submod "foo.rkt" private private private #%sekret-modyule)
.

@greg you’re on to something — actual snippet from my project: (module+ secret-provide
(provide (for-syntax lookup-contract)))

Mainly it’s just I was amused, I was reading @notjack’s nicely written code using all these clean rebellion abstractions and my eye snagged on (dynamic-require ''#%expobs 'current-expand-observe)
and I’m like, what even am I ….

Trust your instincts. I’m still deeply suspicious of it.

@notjack Some possible ideas, if you haven’t seen this: https://github.com/rmculpepper/sexp-rewrite/blob/master/racket-rewrites.el

@soegaard2 generated this diff in your little-helper
project :grin:

link to rule implementation here: https://github.com/jackfirth/resyntax/blob/92b0d6cd161be7ef57f1112f0ca5c4c47bc34dd6/refactoring-rule.rkt#L126

Here’s an idea: change error
to raise-argument-error
when there’s enough information.

I like that. Might be tricky to do in a semantics-preserving way, but maybe small error message format changes are okay for refactoring rules.

Are there any rules for replacing legacy contracts?

Yeah, those were some of the first ones I added

false/c
-> #f
, etc.

What about ->d
-> ->i
?

Haven’t done that one because I’m not actually sure there is an automated migration from the former to the latter. I only skimmed the docs, but it looked like preconditions have different semantics between the two systems.

From what I can tell, automated migration should be safe. The main difference between ->d
and ->i
is that the former allows the dependent parts to violate the argument contracts within the ->d
form, which is never good.

I am impressed. Works pretty well.