Racket Slack Archive

dedbox

2017-12-1 19:51:02

@notjack got some clarity on Codec API. flushing out details now.

dedbox

2017-12-1 19:59:23

So I’ve been wrapping my head around codec composition and inversion.

dedbox

2017-12-1 20:01:12

The framing codec concept is important for messengers, which have to convert messages to byte arrays for transport.

dedbox

2017-12-1 20:01:20

(and from)

dedbox

2017-12-1 20:03:22

I think framing codecs should be invertible.

dedbox

2017-12-1 20:04:03

because you generally want to be able to parse the things you can print, and vice versa

dedbox

2017-12-1 20:05:32

There are a lot of non-framing codecs, though.

dedbox

2017-12-1 20:06:45

Mainly because the API signature for codecs says, “a codec is any function that maps an argument onto a return value.”

dedbox

2017-12-1 20:08:27

But we only care about the ones that will be used with messengers. (or rather, transports, but messengers are the glue that binds them.)

dedbox

2017-12-1 20:10:36

So useful codecs are the ones that we can compose with framing codecs.

dedbox

2017-12-1 20:15:07

The composition of an invertible framing codec and a non-invertible, non-framing codec is a non-invertible framing codec, which would contradict the model.

dedbox

2017-12-1 20:18:16

So it’s clear that, if framing codecs are invertible, all useful codecs are invertible.

dedbox

2017-12-1 20:21:07

I’m still following the logic, but it’s possible that we don’t want or need messengers to compose or invert.

dedbox

2017-12-1 20:21:32

Or maybe something at a higher layer in the API stack.

dedbox

2017-12-1 20:22:35

Because an HTTP client prints requests and parses responses.

dedbox

2017-12-1 20:23:17

Maybe that’s not a great example because HTTP requests and responses are both HTTP messages.

dedbox

2017-12-1 20:29:24

But what about odd protocols like, I don’t know, an XML-to-JSON microservice or a WebSockets-enabled web server?

dedbox

2017-12-1 20:31:23

We could still use multiple invertible codecs and just ignore the inverses.

dedbox

2017-12-1 20:34:29

We could do that with a composite messenger type that can use a different codec for each direction.

notjack

2017-12-1 20:39:01

so I’ve been thinking about this a bunch too and I think we can make it work by expressing codecs as invertible converters between one “high level message” type and a “segment” (lazy stream of bounded length) of low level messages, where a “message” is any value with a size (natural number with abstract meaning, not necessarily number of bytes) where that size is accessible by the codec in constant time

notjack

2017-12-1 20:46:21

the most primitive message type would be a bytestring, with codecs that use them as a low-level message type being codecs that emit chunks of bytes

notjack

2017-12-1 20:47:29

the “segment” idea is to allow codecs to do things in a lazily streaming parser-combinator manner but with more guarantees on when you can and can’t commit to a parse

notjack

2017-12-1 20:48:28

I think the haskell trifecta library does this a bit with it’s notion of parsers producing and consuming “ropes” of values but honestly I can’t figure out much of anything from it’s docs (blasted haskellers, thinking types are good enough docs for anybody)

notjack

2017-12-1 20:49:14

so codec might end up looking like this maybe:

notjack

2017-12-1 20:56:22

actually I have no idea how it would look right now

notjack

2017-12-1 20:56:35

notjack

2017-12-1 20:58:02

(also I gotta run to meeting)

dedbox

2017-12-1 20:58:15

dedbox

2017-12-1 21:13:39

That trifecta doc is pretty dense :sweat_smile:

dedbox

2017-12-1 21:16:31

But I think I get what’s going on with the sizes

notjack

2017-12-1 21:58:43

another thought: the “high-level vs low-level message” thing is basically what lexing before parsing does

notjack

2017-12-1 21:58:58

so composing codecs would be like nesting multiple kinds of lexing

dedbox

2017-12-1 22:00:41

ok, that’s an angle I can understand

notjack

2017-12-1 22:00:52

e.g. bytes -> simple tokens -> more complex tokens -> parsed values

notjack

2017-12-1 22:03:48

so maybe the codec interface should work in a way where you could stick a parser combinator library inside the parsing logic of a single message, but with the codec having extra logic around the parser that knows how to bound the amount of low level messages to read so the parser is applied to finite input of known size

notjack

2017-12-1 22:12:26

oh! maybe this!

-- start a message read attempt with the following logic:
--   1. take a max number of elements to read and a stream
--   2. reads small number of elements in order to decide on a "segment termination predicate"
--   3. caller can now skip the elements consumed to produce the predicate
--   4. then caller consumes stream elements and hands off to parser combinator until termination predicate is true or until caller decides too many elements
startRead :: Stream s =&gt; Nat -&gt; s a -&gt; (Nat, Predicate a)

dedbox

2017-12-1 22:16:03

dedbox

2017-12-1 22:21:08

To decode an HTTP message start line, I need to read from the current input position to the next line terminator.

notjack

2017-12-1 22:21:28

right

dedbox

2017-12-1 22:22:19

So I might have a line decoder that I can instruct to carve itself up with parser combinators and other codecs?

notjack

2017-12-1 22:22:32

hmmm

notjack

2017-12-1 22:22:39

I think it might be more like a tokenization pass

notjack

2017-12-1 22:23:15

bytes -> method-token + whitespace-token + target-url-token + whitespace-token + newline-token

notjack

2017-12-1 22:23:43

because for each of those token types implementations are allowed to attach finite length limits, and the spec mandates minimum length support

dedbox

2017-12-1 22:26:13

This makes me want decoders to be able to use other decoders like an input port.

notjack

2017-12-1 22:26:38

I think that makes sense

dedbox

2017-12-1 22:27:31

So we can build messages as a series of progressively more abstract tokens.

notjack

2017-12-1 22:27:40

yes, that’s definitely what I want

notjack

2017-12-1 22:28:08

also

notjack

2017-12-1 22:28:28

I think it might help to think of the underlying bytes input and output as a stream of byte-chunk tokens

notjack

2017-12-1 22:28:52

that would let codecs figure out segmentation / buffering / MTU logic

dedbox

2017-12-1 22:33:00

so what level of detail is appropriate for tokens? A TCP packet orEthernet frames? A utf8-encoded character? a 2GB binary blob? All of the above?

notjack

2017-12-1 22:33:49

all of the above - a token would be the “message type” I described earlier

notjack

2017-12-1 22:34:02

it’s just a value with a size of some sort

dedbox

2017-12-1 22:35:17

ok, so “messages” are like tokens, and codecs/messengers are like lexers. Then protocols are like parsers.

dedbox

2017-12-1 22:35:40

in that they produce and consume streams of tokens

notjack

2017-12-1 22:36:35

hmm

notjack

2017-12-1 22:36:39

not sure

notjack

2017-12-1 22:36:49

alright here’s a simple and pretty practical example

notjack

2017-12-1 22:37:05

json-rpc is a protocol for RPC via sending and receiving json values

notjack

2017-12-1 22:37:52

it’s not too low level while having mostly simple parsing/lexing

notjack

2017-12-1 22:42:48

Client sends either requests, notifications, or batches containing a mix of requests and notifications. Server sends response for each request, not necessarily in a batch (multiple responses for a single batch request), and sends nothing for notifications.

notjack

2017-12-1 22:43:10

so there’s the following types involved:

notjack

2017-12-1 22:49:34

JsonRpc = Request \| Response \| Notification \| Batch \| Error
Json = ... structured representation of json type ...
JsonToken = LeftBracket \| RightBracket \| String \| Number \| Null \| Undefined \| Boolean \| Whitespace \| Newline \| Colon

notjack

2017-12-1 22:51:04

you’d start with a Codec JsonToken ByteString - it implements the basic json token serialization and deserialization logic between one JsonToken and possibly many chunks of bytes (as would happen with a very large string value)

notjack

2017-12-1 22:51:43

even though this sounds like lexing, it has parsing elements to it because correctly reading a json number can involve some pretty wild stuff

notjack

2017-12-1 22:52:19

but with that codec you’d have to tell it which kind of token to read, it wouldn’t try to figure it out from the bytes it’s given

notjack

2017-12-1 22:52:30

so it’s not like a parser which just figures out what it’s given

notjack

2017-12-1 22:53:08

(or maybe that would be better expressed with separate codecs for each of the token types, with all codecs having the same Codec JsonToken ByteString type)

notjack

2017-12-1 22:53:17

(I’m shooting from the hip at the moment)

notjack

2017-12-1 22:53:19

anyway

notjack

2017-12-1 22:53:58

then you do the same thing at one level higher: make a bunch of codecs of type Codec Json JsonToken

notjack

2017-12-1 22:54:06

then do again for Codec JsonRpc Json

notjack

2017-12-1 22:55:21

since at each of those levels, sending and receiving one value of the left type involves sending and receiving multiple values of the right type - and you can give a size to the left type value that tells you a bound on how many values of the right type you’ll send or receive

dedbox

2017-12-1 22:56:43

So, one codec “extracts” JsonTokens from ByteStrings without necessarily examining every byte. Another “extracts” Json objects from JsonTokens. Another “extracts” JsonRpc requests from Json objects.

notjack

2017-12-1 22:56:57

Yes

dedbox

2017-12-1 22:57:20

And the protocol spec says which bytes to examine.

notjack

2017-12-1 22:57:29

also yes

notjack

2017-12-1 22:58:00

in that there’s a spec saying how to serialize json values to Unicode, then there’s UTF–8 for turning Unicode to bytes

notjack

2017-12-1 22:58:23

(maybe there should be a UnicodeStr type or something in between the JsonToken and ByteString types)

dedbox

2017-12-1 22:59:10

Then a protocol is a set of codecs along with a set of rules on how and when to use them.

notjack

2017-12-1 22:59:32

I think the parser combinator way to do this would be to use monadic bind to implement the chaining from simple types into complex types

notjack

2017-12-1 23:00:05

but not doing it monadically might mean the codecs can have more control over how to combine a high level codec with a low level one

notjack

2017-12-1 23:01:20

and yes, the JsonRpc protocol (version 2.0) says stuff like “one of you is the client and one is the server. only the client initiates communication. only the server sends Response or Error objects. only the client sends Request or Notification objects. either party may send Batch objects”

notjack

2017-12-1 23:01:48

that sort of logic absolutely shouldn’t go inside codecs or a parser, because it’s so incredibly ad-hoc from protocol to protocol that any common framework for it will get really, really complicated

notjack

2017-12-1 23:02:15

codec encapsulates the pure logic of serialization and deserialization

notjack

2017-12-1 23:02:52

messenger encapsulates the logic of actually using a codec over a transport (that can be very nontrivial and involve things like stream multiplexing)

notjack

2017-12-1 23:03:30

net2 abdicates responsibility of actually sending and receiving the correct messages at the correct times, since that’s protocol specific and can’t be part of net2’s generic framework

notjack

2017-12-1 23:04:05

but it can provide building blocks for common patterns, like a client that only lets you send messages and wait for response messages - it wouldn’t let you read responses at any random time

dedbox

2017-12-1 23:07:14

I just stumbled onto a blog about HTTP/2 using words like frame and stream multiplexing. I should probably read the HTTP2 RFC.

notjack

2017-12-1 23:07:45

there’s a very good overview blog post by one of the spec authors

notjack

2017-12-1 23:07:47

lemme link you

notjack

2017-12-1 23:09:46

hmmm maybe I’m misremembering an amalgamation of articles

notjack

2017-12-1 23:11:30

here’s some of them: - https://www.mnot.net/blog/2016/04/22/ideal-http - https://www.mnot.net/blog/2014/01/30/http2_expectations

notjack

2017-12-1 23:12:08

also this one on header compression: https://www.mnot.net/blog/2013/01/04/http2_header_compression

dedbox

2017-12-1 23:12:20

cool, thanks