alexknauth
2019-8-14 17:58:35

@rokitna Raised an interesting point in https://github.com/racket/racket2-rfcs/issues/3#issuecomment-521127693

For a process of how programmers determine tree structure and how many closing parens to write on a line at a glance. For most programming languages this involves indentation, but this comment shows there’s more to it than that.

So does it make sense for one of the considerations when designing a syntax to be this: * a documented process for programmers to determine tree structure

Certain syntaxes can make this easier or harder. S-Expressions have a process that looks something like this: 1. Keep the code indented: Everything within parens should be to the right of the open-paren. Mentally draw a rectangle around it with the top-left corner at the open-paren (IDE automates this). 2. Read the indentation: a. Up: To find the parent open-paren, find the left-edge of visible text on your current expression, then follow that edge-line up until you find the closest open-paren left of it. b. Down: To find the children of an open-paren (after the children on the same line), go down from there and take each leftmost-so-far expression, for as long as they’re still to the right of the open-paren.

There should be a documented process for this for any alternative syntax we propose for Racket2. Note that this is not the same as defining the parser rules… this is about how programmers think visually


notjack
2019-8-14 18:19:06

There’s a related principle that Google uses for designing autoformatters. They wrote an entire design document about it internally, it’s called the Rectangle Rule.


rokitna
2019-8-15 01:41:26

Hmm, Jay also responded to me in terms of how to determine “structure.” He talked about using indentation for conveying structure to humans and thinking of parens as a tax paid to the parser.

The way I was thinking of it, I was talking about techniques human maintainers use to read and deal with the parentheses themselves. Indentation helps tremendously in this maintenance because for C curly brackets it redundantly encodes the nesting depth, and for Lisp parens it pushes each line of code past the nearest open paren surrounding it. In each case, indentation is a means to an end, paving the way for a fast way to match parens by eye.

One thing I’m chewing on about Jay’s response is that it may be the case that some forms of “structure” matter enough to influence indentation style but don’t particularly have to do with paren-matching.

I think I have some preferences like those myself, largely because I have an opinion about what the parens “would” be if I were using a built-in syntax. For instance, I like to do this:

setTimeout(function () {
   console.log("it's been a second now");
}, 1000);

because I figure it really oughta be this:

setTimeoutMillis (1000) {
   console.log("it's been a second now");
}

If the Rectangle Rule @notjack is talking about is accurately described here… https://github.com/google/google-java-format/wiki/The-Rectangle-Rule

…then my indentation preferences for setTimeout don’t conform to it.


notjack
2019-8-15 01:44:24

@rokitna Yup that’s the rule. And yes, that setTimeout example doesn’t conform to it. Which agrees with me personally since I don’t think that example is readable.


rokitna
2019-8-15 01:44:28

That page’s “Right Parens” section suggests this Rectangle Rule really isn’t in primary service to paren-matching either; some parens are treated as an afterthought.


notjack
2019-8-15 01:46:19

Right. Paren-matching is something I do in service of trying to understand the tree structure of the code, not something I do for it’s own sake.


notjack
2019-8-15 01:47:54

Valuing the parens in and of themselves is, IMO, a purely Lisp thing that other language programmers just don’t do at all, nor should they.


tgbugs
2019-8-15 01:48:51

I find that rainbow delimiters is useful in pretty much every language


tgbugs
2019-8-15 01:49:09

it just ends up being more useful in lisps


rokitna
2019-8-15 01:50:06

Do you have a reason more tangible than personal opinion for that?


notjack
2019-8-15 01:50:17

I think they’re useful any time you have lots of delimiters stacked up next to each other like this));};), but I’d rather just avoid that situation in the first place


tgbugs
2019-8-15 01:51:37

hrm, I almost never look at the right hand delimiter stack unless something has gone horribly wrong, and then I just jump to the bright red angry one


notjack
2019-8-15 01:52:24

If the milliseconds argument is an expression, I can’t figure out where it starts easily. But just applying an autoformatter isn’t the right fix - the argument order for setTimeout should be swapped and maybe it ought to be switched around so that instead of a two-argument function you’re calling a one-argument method on a timeout object


rokitna
2019-8-15 01:52:43

On the contrary, other language designers pay lots of manual attention parens, making use of several different varieties of paren and several precedence rules so that they get the parens just the way they want them. And as a result, the programmers have to pay close attention to them too.


notjack
2019-8-15 01:52:58

Formatting style isn’t something that can be designed independently of the APIs of the code being formatted


tgbugs
2019-8-15 01:53:24

and in the event that my delimiters aren’t paired from the start, being able to hit a single “close delimiter” key (or something like that) is very powerful, especially if the computer can always figure out what it should be (trivial with sexps)


notjack
2019-8-15 01:54:25

I think real world example code would help here, because I’m not sure I understand what each of you are talking about respectively


tgbugs
2019-8-15 01:58:34

here is a block of sxml schema specification that I wrote that is way, way easier for me to read with rainbow parens



notjack
2019-8-15 01:59:18

Oh this is an interesting example


tgbugs
2019-8-15 01:59:44

Critically, at the closing bunch of parens, I have a comment in between, which could easily cause that last closing parent to be missed


tgbugs
2019-8-15 02:00:24

but, the far right paren on L480 is blue, so I know it is not closed, and the one on 482 is green, so I know that I have closed


tgbugs
2019-8-15 02:00:54

if something becomes misaligned because I was moving code around


rokitna
2019-8-15 02:01:04

Does every language with Algol-style function calls and lambdas need to have method calls as well, so that you can apply that solution?


tgbugs
2019-8-15 02:01:08

the color acts as a backup indicator of the nesting depth


tgbugs
2019-8-15 02:01:33

which is also nice when you have many short s-exps that you want to pack on a line


tgbugs
2019-8-15 02:01:45

you can immediately tell that they are all at the same depth


notjack
2019-8-15 02:01:52

I dunno ¯_(ツ)_/¯


notjack
2019-8-15 02:03:15

My original point is just that there is a visual tree structure to code, and there’s the code’s paren notation, and these don’t always match up. And when they don’t, the tree structure is considered “correct” and usually people try and adjust the formatting or change up which places require parens, until the parens reflect the intended structure. So the structure is the semantic source of truth, not the parens themselves.


tgbugs
2019-8-15 02:04:04

in another context, if you can just color your parens, then you basically get “named ends” for free


tgbugs
2019-8-15 02:05:05

I really can’t effectively read s-exp code without it in fact, and it might be a good question for the survey …


notjack
2019-8-15 02:05:13

I have to go Do Things (TM) but I want to talk about this more later. Like, let’s try and take code snippets like that and come up with a few alternative notations / formatting rules and see what happens.


rokitna
2019-8-15 02:08:51

Is this still about the readability of the setTimeout example? Because my preference to squash the setTimeout lambda into the same lines as the call itself is because the lambda is something I think shouldn’t exist in the code. I make an anti-paren-matching indentation choice here specifically because I’m treating the intended structure (rather than the actual parens) as the source of truth.


rokitna
2019-8-15 02:10:50

If you’re now saying that it’s good to do that, then I’m not sure you’re giving me any reason to think of the setTimeout example as being unreadable


notjack
2019-8-15 02:12:23

I’m really not putting that much thought into it, I’m doing other things IRL. It wasn’t a carefully constructed criticism or anything.


sorawee
2019-8-15 02:51:53

One of the things I want to point out is that for power users, having rainbow parens and all fancy keybindings is nice. But beginners shouldn’t need to learn keybindings and visual cues to effectively edit code. Reading code can also occur outside of your favorite editor (e.g., GitHub). Features that you think would help reading code easier might not exist in every environment.


spdegabrielle
2019-8-15 02:53:30

@sorawee you are /so right/!


tgbugs
2019-8-15 03:36:15

sure, but if I’m seriously going to work on a piece of code I get it in my own environment


tgbugs
2019-8-15 03:37:20

designing a syntax that works well on a teletype because it is the lowest common denominator seems misguided at this point in time


tgbugs
2019-8-15 03:39:33

re desiging for the average


tgbugs
2019-8-15 03:39:47

especially when we have completely configurable environments



tgbugs
2019-8-15 03:40:29

day 1 of class, have the students toggle rainbow parens, and try them out to see if it helps make it easier for them to read


tgbugs
2019-8-15 03:40:53

the information is already there embedded in the code


tgbugs
2019-8-15 03:42:47

heck, somewhere I have custom css that I wrote to enable reasonable syntax highlighting on github, of course they don’t number their parens in the markup


tgbugs
2019-8-15 03:45:35

I do agree about the keybinding issue, it is unreasonable to expect students to learn an editor while also focusing on their coursework


tgbugs
2019-8-15 03:47:47

though I will say it is an utter travesty that anyone reads code on github, in the sense that, the darned raw file is right there as a link and it shouldn’t be that hard to load it into a buffer with the click of a button (yet it is :confused: )


tgbugs
2019-8-15 03:50:14

pair programming is another more realistic case where only one coloring is possible


notjack
2019-8-15 04:09:10

also consider code snippets in chat apps and anywhere that supports markdown


rokitna
2019-8-15 04:39:51

I’m not sure what kind of code example would help convey what I mean when I say non-s-expression language designers and programmers pay lots of attention to the parens themselves. They give different semantic meanings to (), [], and {}. They specify grammars where the number of different tokens used for determining structure (a passable definition for “parens”) can get into the dozens or hundreds. And for all these variations of parens they meticulously design, as soon as one set of parens like {} ends up being nested in ways that spread over multiple lines of code, programmers can be tempted to use indentation to keep better track of it. It’s hard to throw a rock without hitting an example of this in action.

For instance, even grade school arithmetic is a language that uses various tokens like + and × to determine the nesting structure of an expression, and as a result, individual users of this notation have to pay careful attention to PEMDAS. Maybe they learn PEMDAS just to “satisfy the parser” (the parser being their teacher), but these particular rules eliminate most of the grouping paren clutter when dealing with polynomials in particular, so they’re actually a rather compelling solution for a particular domain.

Because of that ubiquity of examples, it may be more informative to ponder counterexamples. I’m not sure this will very clearly support my point, because I consider the counterexamples to be legitimate too, but it might put it in context a little better.

As for those counterexamples… I’m not sure how people lay out their code structure in languages like assembly and spreadsheets, where structure is more of a purely stylistic concern, or in COBOL, where even the amount of whitespace used has various semantic implications, but these could be interesting sources of counterexamples. Grade school math notation is itself a counterexample when it spans multiple lines, and that’s one I can unpack a bit.

Math notation faces different typographical constraints than plain text code does. Multi-line expressions can be clarified with long horizontal bars and very tall parens rather than just using whitespace. Even the whitespace techniques are more freeform than what plain text indentation can do, making it easy for multi-argument functions to use 2D layout (e.g. for combinations “n choose r”) and subscripts (e.g. for logarithms with specific bases) rather than ever resorting to comma-separated argument lists. As a result, it seems like most of the fiddly parts of nested layout about grouping, and an operator is usually associated with each of its arguments by proximity in a certain direction, rather than by counting out a certain number of commas or expressions into the text.

In contrast, plain text code does face those extra constraints to its layout. Plain text code is copied and pasted in string-shaped pieces, which are naturally delimited by parens, which I think leads to the emergence of parens. In turn, I believe the maintenance of parens is a sufficiently specific concern to guide the emergence of an indentation style. I think this explains how so many plain text languages have come to use indentation to convey structure, even if text selections and parens aren’t the first things on the mind of many programmers.


rokitna
2019-8-15 04:44:00

I absolutely wanted to bring up the tragedy that paren-matching tooling like rainbow parens isn’t easy to use when reading code on GitHub or Slack. I’m glad I’m not the only one. XD I do think having more ubiquitous access to this tooling is a good vision for the future even if it’s not something a single programming language design should necessarily count on.


notjack
2019-8-15 04:44:20

What I mean is that we should take lots of actual real world code and compare-contrast how it would look given different autoformatting rules


notjack
2019-8-15 04:44:52

I don’t think any discussion without that context would be productive


notjack
2019-8-15 04:45:21

Too easy to get lost in endless abstract hypotheses


notjack
2019-8-15 04:46:14

Or well, “any discussion” is perhaps too strong. “Most discussions” is what I mean.


rokitna
2019-8-15 04:46:19

I find the constraints of plain text and the needs of modular code-sharing are sufficient to determine the design of most of the rest of a language.


rokitna
2019-8-15 04:47:23

But if this language didn’t resemble the design of Racket, I wouldn’t bother to mention it.


notjack
2019-8-15 04:47:29

That’s like the same core constraints that almost all major languages were designed with and they reached wildly different conclusions so I don’t think that’s close to true at all


rokitna
2019-8-15 04:49:00

Most mainstream languages have reached pretty much the same conclusions. I think spreadsheets are an outlier there but that’s because they’re not plain text.


notjack
2019-8-15 04:49:30

By “conclusions” I mean “choice of surface syntax and formatting conventions”


rokitna
2019-8-15 04:49:57

yes, most languages use roughly context-free surface syntax and use formatting conventions that reflect the tree structure


notjack
2019-8-15 04:51:26

Sure. But there’s different ways to do that. Braces, significant indentation, spacing conventions, notation for specific constructs, etc. What I’m saying is that in order to make decisions about those surface syntax choices, we need to actually see what a decision’s impact would be on a large body of existing code.


notjack
2019-8-15 04:52:38

since I don’t think those choices or their tradeoffs are obvious


rokitna
2019-8-15 04:59:44

I don’t think it can be boiled down to applying different pretty-printing algorithms to the same codebase. Languages where it’s intolerable to maintain nested parens have styles where nesting is discouraged, sometimes even opting for completely different semantics like method chaining or coroutines in certain twilight areas where nesting is intolerable but these other options are not.


rokitna
2019-8-15 05:04:17

To port code from a language where nesting is tolerable to a language where it isn’t can involve choosing meaningful names for parts of the program that were just considered control flow blocks before. An automatic translation… might be able to pick fairly good names using AI, but it’s a task that benefits from awareness of cultural context.


notjack
2019-8-15 05:05:11

I don’t mean that that’s all it can be boiled down to. I mean that real comparisons on existing code is a minimum for understanding notation choices.


notjack
2019-8-15 05:05:35

you can’t just talk about it hypothetically


rokitna
2019-8-15 05:10:34

Well, the specific alternative I was seeking to better understand was the style Jay’s been using in examples like these: https://github.com/racket/racket2-rfcs/issues/3#issuecomment-515618581


rokitna
2019-8-15 05:12:25

where both opening parens and closing parens tend to be at the end of a line. This doesn’t fit my explanation for why certain indentation styles have caught on, so I was thinking there might be some more considerations I should take into account


notjack
2019-8-15 05:14:12

That’s a good step, but I explicitly mean that real code in the wild needs to be considered, not just examples of real code. Like I firmly believe that any non-s-exp syntax for racket would need to come with a tool for converting from s-exps just so that we can run the tool over everything in http://pkgs.racket-lang.org\|pkgs.racket-lang.org to understand how the notation would look. We’d need that just to design the notation, let alone dealing with the process of actually converting code.


rokitna
2019-8-15 05:15:08

The things I’m talking about have been on my mind a lot as I’ve settled on indentation styles for using Parendown in my code: https://github.com/lathe/parendown-for-racket/blob/master/README.md


rokitna
2019-8-15 05:15:25

that’s the “in the wild” experience I’m drawing from


notjack
2019-8-15 05:17:46

so I’m imagining a process like this then: 1. pick a random open source racket package 2. find all of its source code (or at least a dozen or so files) 3. convert the code to your proposed notation, without making any unrelated changes (ideally this part is automated) 4. compare and contrast


notjack
2019-8-15 05:18:50

oh and step 5: make a repository with both the original and the modified code side by side so the data is viewable by people you want to discuss this with


notjack
2019-8-15 05:19:13

or some other way of making sure all the diffs are viewable


notjack
2019-8-15 05:19:26

Oh, a pull request would be a good way to do that


rokitna
2019-8-15 05:20:25

I’m proposing a more standard indentation style for C-style code as opposed to the style Jay’s proposing. I think most code is already in “my” notation.


rokitna
2019-8-15 05:20:51

but I don’t think Jay needs to justify his preference using such elaborate means either


rokitna
2019-8-15 05:21:08

it’s enough that it works best for him


notjack
2019-8-15 05:21:16

No I mean take like one of my github repos and convert the code in it to use Parendown, then open a pull request so I can actually see how it would look


notjack
2019-8-15 05:22:54

For personal preferences that’s totally unnecessary, but if you’re designing a notation for other people then I think it’s appropriate


notjack
2019-8-15 05:23:27

Not like, as a first step. But definitely at some point along the way.


rokitna
2019-8-15 06:16:18

Hmm… That’s an interesting point. Certain paren-related considerations have been tangible to me partly because I’ve been working on alternatives to the notation of parens themselves in Parendown and Punctaffy. Until other people are also weighing alternatives to the notation of parens, the considerations might not be so tangible to them.

On the other hand, I was discussing paren-matching-by-eye and indentation in pretty much the same terms long before thinking about Parendown or Punctaffy. I would explain the zig-zagging technique to help new Arc programmers get a better handle on how to spot paren-matching errors in their s-expression syntax. (It seems like the corresponding advice for Racket newcomers is mostly “let DrRacket handle it,” which doesn’t actually explain why DrRacket does what it does and not something else. To be fair, “I just let Emacs handle it” was pretty common on Arc Forum too.)

Maybe even closer to the root of where I’m coming from, Arc’s if is like Scheme’s cond with some parens removed, and there was a bit of ongoing disagreement over various ways to indent it. Although indenting the condition and the branch to the same column was helpful for zig-zagging paren-matching, it was unhelpful for distinguishing them from each other. I think this topic came up a number of times, it always seemed to go nowhere, and I started to value whatever abstract principles could help a new language community make a distinct decision about these things.