notjack
2021-3-8 21:00:51

idle thought of the day: I think syntax->s-expression might be a less confusing name than syntax->datum


kellysmith12.21
2021-3-8 21:24:51

That would be clearer.


sorawee
2021-3-8 21:31:31

except that it doesn’t need to be an S-expression?

(struct s (x)) (s-x (syntax->datum (datum->syntax #f (s 1))))


kellysmith12.21
2021-3-8 21:50:35

I didn’t know that structs could be used in syntax objects.


notjack
2021-3-8 21:57:49

I think 'foo counts as an s-expression to many people, so I’d be fine with that


notjack
2021-3-8 21:59:13

@kellysmith12.21 Anything can be stuffed into a syntax object. It’s just that the expander only looks deeper inside lists. And if you shove something into a syntax object that isn’t… serializable, I think? then you get “3-d syntax” which can’t be compiled to bytecode and therefore can only exist in the intermediate stages of macro-expanding a module.


rokitna
2021-3-8 22:00:28

In Punctaffy, where I’m using other data structures to represent pieces of code, I refer to Racket syntax as s-expr-stx to distinguish it from syntax based on other representations. For example, we could imagine hygienic reader macros that transform string-stx or input-port-stx, which would similarly associate locations, scope sets, and syntax properties with parts of the text.


rokitna
2021-3-8 22:01:04

(In Punctaffy, I’m using representations that are in general more structured than s-expressions are.)


notjack
2021-3-8 22:05:09

That makes sense. There could probably be a similar thing in a Honu-like system to distinguish the post-read but pre-enforestation token stream syntax objects


rokitna
2021-3-8 22:07:13

Some part of the expander must “look into” vectors, prefabricated structs, hashes, and boxes too; I think these are normalized into immutable versions. The stablest way I’ve found to preserve a reference in 3D syntax is to put it into a lambda that returns it.


rokitna
2021-3-8 22:09:19

Anyway, all this is to say most things are s-expressions, and syntax objects feel to me like a variation of s-expressions that often warrants using the term “s-expression” to explain it.


notjack
2021-3-8 22:11:30

I usually describe them as “like s-expressions but with metadata for tracking things like scope, source locations, etc.” since “s-expression” almost always means code-as-plain-data to people.


rokitna
2021-3-8 22:13:08

I think I like the idea of there being s-expression syntax objects and s-expression datums (things that don’t bother carrying the metadata). So there could be a syntax->datum for Racket’s primary syntax representation (s-expressions), and other representations would have things like input-port-syntax->datum.


rokitna
2021-3-8 22:15:35

But… that doesn’t mean it’s less confusing, I dunno. Datum is a term that is given meaning in relation to Racket syntax, but it probably isn’t one people would ever use that way if they weren’t in the context of Racket


notjack
2021-3-8 22:17:13

Yeah I think “datum” is terminology that could easily be done without


rokitna
2021-3-8 22:22:34

Come to think of it, the representations I’ve used so far are more like “hyperbracketed (s-expression syntax objects)” than "(hyperbracketed s-expression) syntax objects," so I might not actually need to distinguish “syntax objects” from “s-expression syntax objects” for this purpose. But I do anticipate "(hyperbracketed s-expression) syntax objects" coming up someday.


rokitna
2021-3-8 22:24:08

hmm… how about changing syntax->datum to s-expression-remove-marginalia or something? :smile:


notjack
2021-3-8 22:26:51

“marginalia” :laughing:


kellysmith12.21
2021-3-9 01:54:47

I’ve been thinking, syntax objects built from s-exprs are great for manipulating user-facing syntax, but they’re not as good for a compiler, which would benefit from something similar, but more structured.


sorawee
2021-3-9 01:55:51

Fully expanded program looks very structured to me



samth
2021-3-9 02:20:21

Note that serializable here isn’t in the sense of prop: serialize


samth
2021-3-9 02:21:45

There are lots of considerations for a compiler IR, but syntax objects aren’t that


kellysmith12.21
2021-3-9 02:24:13

Would it be possible to have a library for building/using compiler IRs for use in macros, or are IRs too domain specific to abstract over like that?


notjack
2021-3-9 02:51:22

what is it in the sense of?


samth
2021-3-9 02:52:25

I don’t think there’s anything else that corresponds to “can be serialized in byte code” other than that itself


notjack
2021-3-9 02:53:15

is there some list of bytecode-serializable types in the docs somewhere?


samth
2021-3-9 02:53:27

No I don’t think so


notjack
2021-3-9 02:54:36

@kellysmith12.21 Yeah, I think there could be use cases for that. Lots of complex macros “fully” expand code to some alternate set of core forms and then process those forms somehow.


samth
2021-3-9 02:54:49

It might just be “things traversed by datum->syntax” plus symbols and anything with a read syntax


notjack
2021-3-9 02:55:07

anything with a read syntax?


notjack
2021-3-9 02:55:25

oh you mean like


notjack
2021-3-9 02:55:31

anything with a reader notation for it


samth
2021-3-9 02:55:32

Strings, booleans, regexps, etc


notjack
2021-3-9 02:55:37

not read-syntax


samth
2021-3-9 02:55:48

Yeah


notjack
2021-3-9 02:56:07

hmm


samth
2021-3-9 02:56:31

Note that uninterned symbols are a special case which is why I listed symbols separately


notjack
2021-3-9 02:57:23

:thought_balloon: reader notation for “module path + symbol naming a deserializer combined with bytes for some prop:serializable object”


samth
2021-3-9 02:59:03

I don’t think there are any macros that really use an IR in that sense without being a full compiler sort of glued on to the macro system (like the JavaScript package)


kellysmith12.21
2021-3-9 03:00:14

I figured it could be useful in cases like match, which is effectively a tiny compiler, or when embedding a DSL or building a #lang.


samth
2021-3-9 03:46:58

match does have an IR in some sense, but the syntax-object-ness doesn’t come up much there