As an aside, @sorawee has just taught me how to write tests for whether macros evaluate expressions in tail position, which is awesome
:slightly_smiling_face:
See also “Test for tail position” section in https://srfi.schemers.org/srfi-157/srfi-157.html
@yilin.wei10 The default lexer runs before the parser/expander, so it has no extra information. However the syntax colorer runs later, so in principle you can make a plug-in that saves type information for later use.
@jjsimpso has joined the channel
@yilin.wei10 Basically saying what @soegaard2 did, using more words: :simple_smile: Sometimes in programming editors people talk about "syntax highlighting" (which mostly corresponds to the kind of tokenization done by a lexer) and/or "semantic highlighting" (which probably corresponds more to what you could get from running drracket/check-syntax
and/or doing other analysis of fully-expanded code).
Thinking of them as two layers or passes is probably good. That is, a lexer can probably work fast-enough to keep up with a user typing (keys) quickly, especially if re-lexing changes is handled smartly.
Whereas there is no way in heck that an analysis of fully-expanded code is going to work that fast. It needs to be something that runs “lazily”, comes in later and updates highlighting to be “richer” or “better”. AFAICT.
In most programming editors the “lexer” is really a pile of regular expressions, which works “good-enough” but fails a lot of corner cases.
I think Dr Racket is fairly unusual in using a “real” lexer, as well as one that each #lang
can provide.
The other thing about lexing (even using regexps) is that you need that amount of information to do indentation (but not the full semantic analysis). AFAIK.
In most cases indentation needs to know about “open” and “close” tokens for “expressions” or “blocks” or whatever the lang uses.
Offside rule langs like Python or Haskell have either “indent” or “outdent” tokens (the former) or the indentation is really a shorthand for curly braces (the latter). I think.
(Of course for those langs auto-indentation needs to work a little differently.)
I’m blabbing on about this b/c I’ve been looking at it lately for Racket Mode on Emacs, and trying also think about non-s-expression langs.
Yes; I’m an emacs guy so I’m familiar with syntax highlighting (font-lock) and got ridiculously excited because I wondered whether racket had a way to “transform” syntactic objects.
Does racket-mode
on emacs use racket to get that information?
It’s tricky because in Emacs “syntax” means classifying single characters, so that’s really more like “lexing” and you have to hope the important tokens are single chars. :smile:
Racket Mode uses a pile o’ regexps, for which the number of failing edge cases is smaller over time but definitely still non-zero. :slightly_smiling_face:
However I have a branch where I’ve been working on a racket-hash-lang-mode
that instead uses the “real” lang lexer.
Does it translate the regex from the color-lexer or is it in built?
Ah that would be pretty cool!
No the regexps are hand crafted.
I’ve just rolled an extra major more per language at the moment (extremely new to racket, so that’s only pollen which is any different…)
It also wants to use drracket:indentation
, but that’s currently designed to assume racket/gui
framework and way too heavy.
So I’ve also been sketching out a simpler “token-map” interface, that wouldn’t presume DrRacket and racket/gui.
Oh no way? Racket has indentation for langs? That’s super cool.
I’d be happy to hack on that if you’ve got a branch. Not familiar with racket, but pretty familiar with elisp and emacs.
Yes! A #lang
can supply that. However I’ve found relatively few examples of langs that use it so far. the Scribble one is the main example, plus some 3rd party langs that simply re-provide that.
So I’m hoping if I propose a new protocol, it wouldn’t be too disruptive coughs.
Haha, racket is a research language right :wink:?
The overview page you might want to see is https://docs.racket-lang.org/tools/lang-languages-customization.html
Thank you very much!
I’m surprised that this function doesn’t already exist:
> (adjacent-group-by char-whitespace? (string->list "abc def ghi"))
(list (list #\a #\b #\c) (list #\space) (list #\d #\e #\f) (list #\space #\space) (list #\g #\h #\i))
> (adjacent-group-by abs '(1 -1 2 1 3 -3))
(list (list 1 -1) (list 2) (list 1) (list 3 -3))