
@notjack

> for byte-level serialization protocols not parsing user-written source code

Good catch. I’m not yet actively considering that difference.

Maybe I don’t understand the codecs paragraph in the roadmap, then.

Can you give an example of structured data, in that context?

Wait, you’re talking about Ethernet frames and IP packets as structured data.

@dedbox yes - the main difference is that usually these things specify lengths up front and give you other sorts of structure to make parsing not require unbounded time/space

and make it easy to throw away unneeded stuff quickly (very little backtracking in the grammars)

Hrm. So I’m pretty sure the OS TCP stack does packet framing for you, doesn’t it?

Are there situations where writing a precise number of bytes is faster?

TCP does yes. But HTTP does framing too. 1.0/1.1 sorta do “line framing” with headers, Content-Length frames the body, and chunked encodings do more typical here’s-a-stream-of-chunks-with-separators framing. HTTP2 does a lot more stuff like that in order to do things like header compression and stream multiplexing. And a lot of other protocols on top of TCP add their own sorts of framing stuff.

Can we define “frame” or “framing” precisely? Is a frame just a unit of meaning in a byte sequence, or more something more specific?

ah

by “framing” I mean doing this: " I want to send these bytes, but to do that I need to also send some info about how to send them, so I’ll wrap blocks of bytes with a header of some sort and the header will contain the extra information"

Ethernet frames wrap data with Ethernet-specific control info (I think a checksum and some ARP routing stuff), IP frames wrap data with IP-specific control info like routing and addressing stuff, TCP wraps data with TCP stuff like session and segment info, HTTP wraps messages with headers and chunking / encoding info, SOAP wraps messages in a SOAP envelope, etc

the most common use I’ve noticed is trying to say how much stuff you’re sending so the other side knows how to tell different messages apart and when to stop listening

Ok. So, would we call a length-prefixed string a frame?

I think so, yes

And, one more. Would we say the headers of an HTTP message are framed by the end of the start line and an empty line?

That one I’m less sure about, but I think so. The HTTP spec does specify that it’s reasonable for implementations to place limits on the length of a single header line so it’s sort of a size limit. There’s no length prefix in that case though, only an upper bound.

Ok thanks. I’m on the road now and will follow up after lunch time.

If we say a frame is a byte array of (efficiently) computable length, and framing is the act of assembling bytes into frames, then we can make strong performance guarantees on framing codecs.

Then it’s a little easier to think about subclasses of odecs, like length-prefixed-frame
, bounded-length-frame
, bytes-delimited-frame
, etc.

Then an HTTP header line codec can compose (bytes-delimited-frame #"\r\n")
with (bounded-length-frame MAX-HEADER-LENGTH)

and I’d be able to derive performance characteristics for the whole thing based on the given framers.

So then we can also say the header is a frame. We could frame it by mapping pair->http-header-line-frame
across a headers
alist and joining the results. Call that http-header-frame
.

Maybe I’m conflating framing and codecs

In that case, we’d need a less general definition of framing.

I don’t know, it feels kinda right. It seems to work in the opposite direction, too.

“Unframing” a length-prefixed frame is easy.

A bounded-length frame is a little trickier. It would need a sub-unframer to work for frames of length less than the bound.

Bytes-delimited frames are also easy.

So given composable and invertible codecs, we might be able to express arbitrarily complex codecs in a declarative style, which means it will be easy to ascribe and check types.

I could also reason about codec composition as if it were function composition.