notjack
2020-8-1 07:30:34

As an aside, @sorawee has just taught me how to write tests for whether macros evaluate expressions in tail position, which is awesome


sorawee
2020-8-1 07:45:47

:slightly_smiling_face:


sorawee
2020-8-1 07:46:29

See also “Test for tail position” section in https://srfi.schemers.org/srfi-157/srfi-157.html


soegaard2
2020-8-1 10:21:14

@yilin.wei10 The default lexer runs before the parser/expander, so it has no extra information. However the syntax colorer runs later, so in principle you can make a plug-in that saves type information for later use.


jjsimpso
2020-8-1 18:23:01

@jjsimpso has joined the channel


greg
2020-8-1 20:52:09

@yilin.wei10 Basically saying what @soegaard2 did, using more words: :simple_smile: Sometimes in programming editors people talk about "syntax highlighting" (which mostly corresponds to the kind of tokenization done by a lexer) and/or "semantic highlighting" (which probably corresponds more to what you could get from running drracket/check-syntax and/or doing other analysis of fully-expanded code).


greg
2020-8-1 20:53:31

Thinking of them as two layers or passes is probably good. That is, a lexer can probably work fast-enough to keep up with a user typing (keys) quickly, especially if re-lexing changes is handled smartly.


greg
2020-8-1 20:54:21

Whereas there is no way in heck that an analysis of fully-expanded code is going to work that fast. It needs to be something that runs “lazily”, comes in later and updates highlighting to be “richer” or “better”. AFAICT.


greg
2020-8-1 20:57:23

In most programming editors the “lexer” is really a pile of regular expressions, which works “good-enough” but fails a lot of corner cases.


greg
2020-8-1 20:57:50

I think Dr Racket is fairly unusual in using a “real” lexer, as well as one that each #lang can provide.


greg
2020-8-1 20:59:36

The other thing about lexing (even using regexps) is that you need that amount of information to do indentation (but not the full semantic analysis). AFAIK.


greg
2020-8-1 21:01:15

In most cases indentation needs to know about “open” and “close” tokens for “expressions” or “blocks” or whatever the lang uses.


greg
2020-8-1 21:02:18

Offside rule langs like Python or Haskell have either “indent” or “outdent” tokens (the former) or the indentation is really a shorthand for curly braces (the latter). I think.


greg
2020-8-1 21:02:52

(Of course for those langs auto-indentation needs to work a little differently.)


greg
2020-8-1 21:03:38

I’m blabbing on about this b/c I’ve been looking at it lately for Racket Mode on Emacs, and trying also think about non-s-expression langs.


yilin.wei10
2020-8-1 21:04:35

Yes; I’m an emacs guy so I’m familiar with syntax highlighting (font-lock) and got ridiculously excited because I wondered whether racket had a way to “transform” syntactic objects.


yilin.wei10
2020-8-1 21:06:35

Does racket-mode on emacs use racket to get that information?


greg
2020-8-1 21:06:58

It’s tricky because in Emacs “syntax” means classifying single characters, so that’s really more like “lexing” and you have to hope the important tokens are single chars. :smile:


greg
2020-8-1 21:07:44

Racket Mode uses a pile o’ regexps, for which the number of failing edge cases is smaller over time but definitely still non-zero. :slightly_smiling_face:


greg
2020-8-1 21:08:15

However I have a branch where I’ve been working on a racket-hash-lang-mode that instead uses the “real” lang lexer.


yilin.wei10
2020-8-1 21:08:17

Does it translate the regex from the color-lexer or is it in built?


yilin.wei10
2020-8-1 21:08:28

Ah that would be pretty cool!


greg
2020-8-1 21:08:35

No the regexps are hand crafted.


yilin.wei10
2020-8-1 21:08:58

I’ve just rolled an extra major more per language at the moment (extremely new to racket, so that’s only pollen which is any different…)


greg
2020-8-1 21:08:59

It also wants to use drracket:indentation, but that’s currently designed to assume racket/gui framework and way too heavy.


greg
2020-8-1 21:09:24

So I’ve also been sketching out a simpler “token-map” interface, that wouldn’t presume DrRacket and racket/gui.


yilin.wei10
2020-8-1 21:09:43

Oh no way? Racket has indentation for langs? That’s super cool.


yilin.wei10
2020-8-1 21:10:13

I’d be happy to hack on that if you’ve got a branch. Not familiar with racket, but pretty familiar with elisp and emacs.


greg
2020-8-1 21:10:43

Yes! A #lang can supply that. However I’ve found relatively few examples of langs that use it so far. the Scribble one is the main example, plus some 3rd party langs that simply re-provide that.


greg
2020-8-1 21:11:16

So I’m hoping if I propose a new protocol, it wouldn’t be too disruptive coughs.


yilin.wei10
2020-8-1 21:11:49

Haha, racket is a research language right :wink:?


greg
2020-8-1 21:13:10

yilin.wei10
2020-8-1 21:19:02

Thank you very much!


sorawee
2020-8-2 03:21:28

I’m surprised that this function doesn’t already exist:

> (adjacent-group-by char-whitespace? (string->list "abc def ghi")) (list (list #\a #\b #\c) (list #\space) (list #\d #\e #\f) (list #\space #\space) (list #\g #\h #\i)) > (adjacent-group-by abs '(1 -1 2 1 3 -3)) (list (list 1 -1) (list 2) (list 1) (list 3 -3))