
@rokitna Raised an interesting point in https://github.com/racket/racket2-rfcs/issues/3#issuecomment-521127693
For a process of how programmers determine tree structure and how many closing parens to write on a line at a glance. For most programming languages this involves indentation, but this comment shows there’s more to it than that.
So does it make sense for one of the considerations when designing a syntax to be this: * a documented process for programmers to determine tree structure
Certain syntaxes can make this easier or harder. S-Expressions have a process that looks something like this: 1. Keep the code indented: Everything within parens should be to the right of the open-paren. Mentally draw a rectangle around it with the top-left corner at the open-paren (IDE automates this). 2. Read the indentation: a. Up: To find the parent open-paren, find the left-edge of visible text on your current expression, then follow that edge-line up until you find the closest open-paren left of it. b. Down: To find the children of an open-paren (after the children on the same line), go down from there and take each leftmost-so-far expression, for as long as they’re still to the right of the open-paren.
There should be a documented process for this for any alternative syntax we propose for Racket2. Note that this is not the same as defining the parser rules… this is about how programmers think visually

There’s a related principle that Google uses for designing autoformatters. They wrote an entire design document about it internally, it’s called the Rectangle Rule.

Hmm, Jay also responded to me in terms of how to determine “structure.” He talked about using indentation for conveying structure to humans and thinking of parens as a tax paid to the parser.
The way I was thinking of it, I was talking about techniques human maintainers use to read and deal with the parentheses themselves. Indentation helps tremendously in this maintenance because for C curly brackets it redundantly encodes the nesting depth, and for Lisp parens it pushes each line of code past the nearest open paren surrounding it. In each case, indentation is a means to an end, paving the way for a fast way to match parens by eye.
One thing I’m chewing on about Jay’s response is that it may be the case that some forms of “structure” matter enough to influence indentation style but don’t particularly have to do with paren-matching.
I think I have some preferences like those myself, largely because I have an opinion about what the parens “would” be if I were using a built-in syntax. For instance, I like to do this:
setTimeout(function () {
console.log("it's been a second now");
}, 1000);
because I figure it really oughta be this:
setTimeoutMillis (1000) {
console.log("it's been a second now");
}
If the Rectangle Rule @notjack is talking about is accurately described here… https://github.com/google/google-java-format/wiki/The-Rectangle-Rule
…then my indentation preferences for setTimeout
don’t conform to it.

@rokitna Yup that’s the rule. And yes, that setTimeout
example doesn’t conform to it. Which agrees with me personally since I don’t think that example is readable.

That page’s “Right Parens” section suggests this Rectangle Rule really isn’t in primary service to paren-matching either; some parens are treated as an afterthought.

Right. Paren-matching is something I do in service of trying to understand the tree structure of the code, not something I do for it’s own sake.

Valuing the parens in and of themselves is, IMO, a purely Lisp thing that other language programmers just don’t do at all, nor should they.

I find that rainbow delimiters is useful in pretty much every language

it just ends up being more useful in lisps

Do you have a reason more tangible than personal opinion for that?

I think they’re useful any time you have lots of delimiters stacked up next to each other like this));};)
, but I’d rather just avoid that situation in the first place

hrm, I almost never look at the right hand delimiter stack unless something has gone horribly wrong, and then I just jump to the bright red angry one

If the milliseconds argument is an expression, I can’t figure out where it starts easily. But just applying an autoformatter isn’t the right fix - the argument order for setTimeout
should be swapped and maybe it ought to be switched around so that instead of a two-argument function you’re calling a one-argument method on a timeout object

On the contrary, other language designers pay lots of manual attention parens, making use of several different varieties of paren and several precedence rules so that they get the parens just the way they want them. And as a result, the programmers have to pay close attention to them too.

Formatting style isn’t something that can be designed independently of the APIs of the code being formatted

and in the event that my delimiters aren’t paired from the start, being able to hit a single “close delimiter” key (or something like that) is very powerful, especially if the computer can always figure out what it should be (trivial with sexps)

I think real world example code would help here, because I’m not sure I understand what each of you are talking about respectively

here is a block of sxml schema specification that I wrote that is way, way easier for me to read with rainbow parens


Oh this is an interesting example

Critically, at the closing bunch of parens, I have a comment in between, which could easily cause that last closing parent to be missed

but, the far right paren on L480 is blue, so I know it is not closed, and the one on 482 is green, so I know that I have closed

if something becomes misaligned because I was moving code around

Does every language with Algol-style function calls and lambdas need to have method calls as well, so that you can apply that solution?

the color acts as a backup indicator of the nesting depth

which is also nice when you have many short s-exps that you want to pack on a line

you can immediately tell that they are all at the same depth

I dunno ¯_(ツ)_/¯

My original point is just that there is a visual tree structure to code, and there’s the code’s paren notation, and these don’t always match up. And when they don’t, the tree structure is considered “correct” and usually people try and adjust the formatting or change up which places require parens, until the parens reflect the intended structure. So the structure is the semantic source of truth, not the parens themselves.

in another context, if you can just color your parens, then you basically get “named ends” for free

I really can’t effectively read s-exp code without it in fact, and it might be a good question for the survey …

I have to go Do Things (TM) but I want to talk about this more later. Like, let’s try and take code snippets like that and come up with a few alternative notations / formatting rules and see what happens.

Is this still about the readability of the setTimeout
example? Because my preference to squash the setTimeout
lambda into the same lines as the call itself is because the lambda is something I think shouldn’t exist in the code. I make an anti-paren-matching indentation choice here specifically because I’m treating the intended structure (rather than the actual parens) as the source of truth.

If you’re now saying that it’s good to do that, then I’m not sure you’re giving me any reason to think of the setTimeout
example as being unreadable

I’m really not putting that much thought into it, I’m doing other things IRL. It wasn’t a carefully constructed criticism or anything.

One of the things I want to point out is that for power users, having rainbow parens and all fancy keybindings is nice. But beginners shouldn’t need to learn keybindings and visual cues to effectively edit code. Reading code can also occur outside of your favorite editor (e.g., GitHub). Features that you think would help reading code easier might not exist in every environment.

@sorawee you are /so right/!

sure, but if I’m seriously going to work on a piece of code I get it in my own environment

designing a syntax that works well on a teletype because it is the lowest common denominator seems misguided at this point in time

re desiging for the average

especially when we have completely configurable environments


day 1 of class, have the students toggle rainbow parens, and try them out to see if it helps make it easier for them to read

the information is already there embedded in the code

heck, somewhere I have custom css that I wrote to enable reasonable syntax highlighting on github, of course they don’t number their parens in the markup

I do agree about the keybinding issue, it is unreasonable to expect students to learn an editor while also focusing on their coursework

though I will say it is an utter travesty that anyone reads code on github, in the sense that, the darned raw file is right there as a link and it shouldn’t be that hard to load it into a buffer with the click of a button (yet it is :confused: )

pair programming is another more realistic case where only one coloring is possible

also consider code snippets in chat apps and anywhere that supports markdown

I’m not sure what kind of code example would help convey what I mean when I say non-s-expression language designers and programmers pay lots of attention to the parens themselves. They give different semantic meanings to ()
, []
, and {}
. They specify grammars where the number of different tokens used for determining structure (a passable definition for “parens”) can get into the dozens or hundreds. And for all these variations of parens they meticulously design, as soon as one set of parens like {}
ends up being nested in ways that spread over multiple lines of code, programmers can be tempted to use indentation to keep better track of it. It’s hard to throw a rock without hitting an example of this in action.
For instance, even grade school arithmetic is a language that uses various tokens like + and × to determine the nesting structure of an expression, and as a result, individual users of this notation have to pay careful attention to PEMDAS. Maybe they learn PEMDAS just to “satisfy the parser” (the parser being their teacher), but these particular rules eliminate most of the grouping paren clutter when dealing with polynomials in particular, so they’re actually a rather compelling solution for a particular domain.
Because of that ubiquity of examples, it may be more informative to ponder counterexamples. I’m not sure this will very clearly support my point, because I consider the counterexamples to be legitimate too, but it might put it in context a little better.
As for those counterexamples… I’m not sure how people lay out their code structure in languages like assembly and spreadsheets, where structure is more of a purely stylistic concern, or in COBOL, where even the amount of whitespace used has various semantic implications, but these could be interesting sources of counterexamples. Grade school math notation is itself a counterexample when it spans multiple lines, and that’s one I can unpack a bit.
Math notation faces different typographical constraints than plain text code does. Multi-line expressions can be clarified with long horizontal bars and very tall parens rather than just using whitespace. Even the whitespace techniques are more freeform than what plain text indentation can do, making it easy for multi-argument functions to use 2D layout (e.g. for combinations “n choose r”) and subscripts (e.g. for logarithms with specific bases) rather than ever resorting to comma-separated argument lists. As a result, it seems like most of the fiddly parts of nested layout about grouping, and an operator is usually associated with each of its arguments by proximity in a certain direction, rather than by counting out a certain number of commas or expressions into the text.
In contrast, plain text code does face those extra constraints to its layout. Plain text code is copied and pasted in string-shaped pieces, which are naturally delimited by parens, which I think leads to the emergence of parens. In turn, I believe the maintenance of parens is a sufficiently specific concern to guide the emergence of an indentation style. I think this explains how so many plain text languages have come to use indentation to convey structure, even if text selections and parens aren’t the first things on the mind of many programmers.

I absolutely wanted to bring up the tragedy that paren-matching tooling like rainbow parens isn’t easy to use when reading code on GitHub or Slack. I’m glad I’m not the only one. XD I do think having more ubiquitous access to this tooling is a good vision for the future even if it’s not something a single programming language design should necessarily count on.

What I mean is that we should take lots of actual real world code and compare-contrast how it would look given different autoformatting rules

I don’t think any discussion without that context would be productive

Too easy to get lost in endless abstract hypotheses

Or well, “any discussion” is perhaps too strong. “Most discussions” is what I mean.

I find the constraints of plain text and the needs of modular code-sharing are sufficient to determine the design of most of the rest of a language.

But if this language didn’t resemble the design of Racket, I wouldn’t bother to mention it.

That’s like the same core constraints that almost all major languages were designed with and they reached wildly different conclusions so I don’t think that’s close to true at all

Most mainstream languages have reached pretty much the same conclusions. I think spreadsheets are an outlier there but that’s because they’re not plain text.

By “conclusions” I mean “choice of surface syntax and formatting conventions”

yes, most languages use roughly context-free surface syntax and use formatting conventions that reflect the tree structure

Sure. But there’s different ways to do that. Braces, significant indentation, spacing conventions, notation for specific constructs, etc. What I’m saying is that in order to make decisions about those surface syntax choices, we need to actually see what a decision’s impact would be on a large body of existing code.

since I don’t think those choices or their tradeoffs are obvious

I don’t think it can be boiled down to applying different pretty-printing algorithms to the same codebase. Languages where it’s intolerable to maintain nested parens have styles where nesting is discouraged, sometimes even opting for completely different semantics like method chaining or coroutines in certain twilight areas where nesting is intolerable but these other options are not.

To port code from a language where nesting is tolerable to a language where it isn’t can involve choosing meaningful names for parts of the program that were just considered control flow blocks before. An automatic translation… might be able to pick fairly good names using AI, but it’s a task that benefits from awareness of cultural context.

I don’t mean that that’s all it can be boiled down to. I mean that real comparisons on existing code is a minimum for understanding notation choices.

you can’t just talk about it hypothetically

Well, the specific alternative I was seeking to better understand was the style Jay’s been using in examples like these: https://github.com/racket/racket2-rfcs/issues/3#issuecomment-515618581

where both opening parens and closing parens tend to be at the end of a line. This doesn’t fit my explanation for why certain indentation styles have caught on, so I was thinking there might be some more considerations I should take into account

That’s a good step, but I explicitly mean that real code in the wild needs to be considered, not just examples of real code. Like I firmly believe that any non-s-exp syntax for racket would need to come with a tool for converting from s-exps just so that we can run the tool over everything in http://pkgs.racket-lang.org\|pkgs.racket-lang.org to understand how the notation would look. We’d need that just to design the notation, let alone dealing with the process of actually converting code.

The things I’m talking about have been on my mind a lot as I’ve settled on indentation styles for using Parendown in my code: https://github.com/lathe/parendown-for-racket/blob/master/README.md

that’s the “in the wild” experience I’m drawing from

so I’m imagining a process like this then: 1. pick a random open source racket package 2. find all of its source code (or at least a dozen or so files) 3. convert the code to your proposed notation, without making any unrelated changes (ideally this part is automated) 4. compare and contrast

oh and step 5: make a repository with both the original and the modified code side by side so the data is viewable by people you want to discuss this with

or some other way of making sure all the diffs are viewable

Oh, a pull request would be a good way to do that

I’m proposing a more standard indentation style for C-style code as opposed to the style Jay’s proposing. I think most code is already in “my” notation.

but I don’t think Jay needs to justify his preference using such elaborate means either

it’s enough that it works best for him

No I mean take like one of my github repos and convert the code in it to use Parendown, then open a pull request so I can actually see how it would look

For personal preferences that’s totally unnecessary, but if you’re designing a notation for other people then I think it’s appropriate

Not like, as a first step. But definitely at some point along the way.

Hmm… That’s an interesting point. Certain paren-related considerations have been tangible to me partly because I’ve been working on alternatives to the notation of parens themselves in Parendown and Punctaffy. Until other people are also weighing alternatives to the notation of parens, the considerations might not be so tangible to them.
On the other hand, I was discussing paren-matching-by-eye and indentation in pretty much the same terms long before thinking about Parendown or Punctaffy. I would explain the zig-zagging technique to help new Arc programmers get a better handle on how to spot paren-matching errors in their s-expression syntax. (It seems like the corresponding advice for Racket newcomers is mostly “let DrRacket handle it,” which doesn’t actually explain why DrRacket does what it does and not something else. To be fair, “I just let Emacs handle it” was pretty common on Arc Forum too.)
Maybe even closer to the root of where I’m coming from, Arc’s if
is like Scheme’s cond
with some parens removed, and there was a bit of ongoing disagreement over various ways to indent it. Although indenting the condition and the branch to the same column was helpful for zig-zagging paren-matching, it was unhelpful for distinguishing them from each other. I think this topic came up a number of times, it always seemed to go nowhere, and I started to value whatever abstract principles could help a new language community make a distinct decision about these things.