
idle thought of the day: I think syntax->s-expression
might be a less confusing name than syntax->datum

That would be clearer.

except that it doesn’t need to be an S-expression?
(struct s (x))
(s-x (syntax->datum (datum->syntax #f (s 1))))

I didn’t know that structs could be used in syntax objects.

I think 'foo
counts as an s-expression to many people, so I’d be fine with that

@kellysmith12.21 Anything can be stuffed into a syntax object. It’s just that the expander only looks deeper inside lists. And if you shove something into a syntax object that isn’t… serializable, I think? then you get “3-d syntax” which can’t be compiled to bytecode and therefore can only exist in the intermediate stages of macro-expanding a module.

In Punctaffy, where I’m using other data structures to represent pieces of code, I refer to Racket syntax as s-expr-stx
to distinguish it from syntax based on other representations. For example, we could imagine hygienic reader macros that transform string-stx
or input-port-stx
, which would similarly associate locations, scope sets, and syntax properties with parts of the text.

(In Punctaffy, I’m using representations that are in general more structured than s-expressions are.)

That makes sense. There could probably be a similar thing in a Honu-like system to distinguish the post-read but pre-enforestation token stream syntax objects

Some part of the expander must “look into” vectors, prefabricated structs, hashes, and boxes too; I think these are normalized into immutable versions. The stablest way I’ve found to preserve a reference in 3D syntax is to put it into a lambda that returns it.

Anyway, all this is to say most things are s-expressions, and syntax objects feel to me like a variation of s-expressions that often warrants using the term “s-expression” to explain it.

I usually describe them as “like s-expressions but with metadata for tracking things like scope, source locations, etc.” since “s-expression” almost always means code-as-plain-data to people.

I think I like the idea of there being s-expression syntax objects and s-expression datums (things that don’t bother carrying the metadata). So there could be a syntax->datum
for Racket’s primary syntax representation (s-expressions), and other representations would have things like input-port-syntax->datum
.

But… that doesn’t mean it’s less confusing, I dunno. Datum is a term that is given meaning in relation to Racket syntax, but it probably isn’t one people would ever use that way if they weren’t in the context of Racket

Yeah I think “datum” is terminology that could easily be done without

Come to think of it, the representations I’ve used so far are more like “hyperbracketed (s-expression syntax objects)” than "(hyperbracketed s-expression) syntax objects," so I might not actually need to distinguish “syntax objects” from “s-expression syntax objects” for this purpose. But I do anticipate "(hyperbracketed s-expression) syntax objects" coming up someday.

hmm… how about changing syntax->datum
to s-expression-remove-marginalia
or something? :smile:

“marginalia” :laughing:

I’ve been thinking, syntax objects built from s-exprs are great for manipulating user-facing syntax, but they’re not as good for a compiler, which would benefit from something similar, but more structured.

Fully expanded program looks very structured to me


Note that serializable here isn’t in the sense of prop: serialize

There are lots of considerations for a compiler IR, but syntax objects aren’t that

Would it be possible to have a library for building/using compiler IRs for use in macros, or are IRs too domain specific to abstract over like that?

what is it in the sense of?

I don’t think there’s anything else that corresponds to “can be serialized in byte code” other than that itself

is there some list of bytecode-serializable types in the docs somewhere?

No I don’t think so

@kellysmith12.21 Yeah, I think there could be use cases for that. Lots of complex macros “fully” expand code to some alternate set of core forms and then process those forms somehow.

It might just be “things traversed by datum->syntax” plus symbols and anything with a read syntax

anything with a read syntax?

oh you mean like

anything with a reader notation for it

Strings, booleans, regexps, etc

not read-syntax

Yeah

hmm

Note that uninterned symbols are a special case which is why I listed symbols separately

:thought_balloon: reader notation for “module path + symbol naming a deserializer combined with bytes for some prop:serializable
object”

I don’t think there are any macros that really use an IR in that sense without being a full compiler sort of glued on to the macro system (like the JavaScript package)

I figured it could be useful in cases like match
, which is effectively a tiny compiler, or when embedding a DSL or building a #lang
.

match
does have an IR in some sense, but the syntax-object-ness doesn’t come up much there