
@robby The culprit seems to be classify-position
in racket/share/pkgs/gui-lib/framework/private/racket.rkt
. If the cursor is at the end of a single-line comment, right before the #\newline
, it is classified as 'whitespace
and not as 'comment
.
I have a simple semi-fix, which consists in checking if • the current character is #\newline
• classify-position
returns 'comment
at the current cursor position –1 • the previous two characters are not \|#
It’s only a semi-fix because it will fail on the following example when the cursor is at the end of the line: ; not a nested comment \|#
However, I expect this case to be rather rare, and overall I think this solution is strictly better than the current state.
How does this sound?

classify-position
returns 'comment
for both line-comments and nested-comments. Ideally, it would a different symbol for each, and would also tell whether the cursor is in a sexp-comment, but it doesn’t.
Also, the rest of the code often relies on a eq? 'comment ...
test, so differentiating between the two comments would require several changes and may be a little risky to change.

On a more positive note, it was nicely simple to test all this using Quickscript! :slightly_smiling_face: https://gist.github.com/Metaxal/6a25ca345e829a65a0bcfa3db3d5a82f

Instead of modifying classify-position
, it may be safer to modify in-position?
in framework/private/racket.rkt
instead, since it already attempts to fix such cases (but misses this one)

in-position?
will also fail on "abc"\|"def"
where | is the cursor: This is classified as in the string, but is actually in-between two strings.

Changing the API to have two comment characters is backwards incompatible and probably best decoupled from the bug fix

But classify position should be using the results of the colorer — does it really itself know what is a comment and what is whitespace?

In other words, is racket-lexer returning the right thing on this example?

- I’m not clear about what you mean re. backward-incompatibility. I wasn’t proposing to change the API, but just performing additional tests to return the correct value. However, this test relies on racket-specific characters such as
#\|
, so may not work for non-racket languages. (Is that what you mean?) The racket lexer doesn’t return enough information. For example, it doesn’t know the difference between line comment, nested-comments and sexp-comments, which is annoying. It only returns
'comment
. However, if you add text at the end of a line-comment, it’s still inside the comment, but at the end of a nested comment it’s outsideI’m currently working on a better bug fix, that may be used for fixing other bugs in other places (while being backward-compatible). This involves an additional
classify-position*
(or whatever name) that returns: • the result ofclassify-position
when the cursor is not of the start position of the current token •(list (classify-position (- pos 1)) (classify-position pos))
when pos is the start of the current token. Then fixin-comment/string?
to return true also for(line-comment whitespace)
.

Though currently this is still relying on racket-specific conditions.

Maybe the lexers should know not only the type (comment, string), but also the subtype (line-comment, here-string, etc.)?

The lexers should probably have returned struct objects, or dictionaries instead of values. It would have been much easier to extend them while keeping backward compatibility

Given how the lexers are implemented, I don’t see a good backward compatible solution. Now attempting a bad one (still bwd compat): Is there a way to know if the current lexer is a racket one? I could then add a test on the lexer type that would be specific to racket only and thus not break other lexers

@laurent.orseau @robby I don’t think PLT-Scheme had structs or dicts in 1995 :smiley:
;; originally by Dan Grossman
;; 6/30/95

We had structs; maybe substructs. I forget. But we could certainly have encoded the idea if had had the foresight to do so! :)