dedbox
2017-12-1 19:51:02

@notjack got some clarity on Codec API. flushing out details now.


dedbox
2017-12-1 19:59:23

So I’ve been wrapping my head around codec composition and inversion.


dedbox
2017-12-1 20:01:12

The framing codec concept is important for messengers, which have to convert messages to byte arrays for transport.


dedbox
2017-12-1 20:01:20

(and from)


dedbox
2017-12-1 20:03:22

I think framing codecs should be invertible.


dedbox
2017-12-1 20:04:03

because you generally want to be able to parse the things you can print, and vice versa


dedbox
2017-12-1 20:05:32

There are a lot of non-framing codecs, though.


dedbox
2017-12-1 20:06:45

Mainly because the API signature for codecs says, “a codec is any function that maps an argument onto a return value.”


dedbox
2017-12-1 20:08:27

But we only care about the ones that will be used with messengers. (or rather, transports, but messengers are the glue that binds them.)


dedbox
2017-12-1 20:10:36

So useful codecs are the ones that we can compose with framing codecs.


dedbox
2017-12-1 20:15:07

The composition of an invertible framing codec and a non-invertible, non-framing codec is a non-invertible framing codec, which would contradict the model.


dedbox
2017-12-1 20:18:16

So it’s clear that, if framing codecs are invertible, all useful codecs are invertible.


dedbox
2017-12-1 20:21:07

I’m still following the logic, but it’s possible that we don’t want or need messengers to compose or invert.


dedbox
2017-12-1 20:21:32

Or maybe something at a higher layer in the API stack.


dedbox
2017-12-1 20:22:35

Because an HTTP client prints requests and parses responses.


dedbox
2017-12-1 20:23:17

Maybe that’s not a great example because HTTP requests and responses are both HTTP messages.


dedbox
2017-12-1 20:29:24

But what about odd protocols like, I don’t know, an XML-to-JSON microservice or a WebSockets-enabled web server?


dedbox
2017-12-1 20:31:23

We could still use multiple invertible codecs and just ignore the inverses.


dedbox
2017-12-1 20:34:29

We could do that with a composite messenger type that can use a different codec for each direction.


notjack
2017-12-1 20:39:01

so I’ve been thinking about this a bunch too and I think we can make it work by expressing codecs as invertible converters between one “high level message” type and a “segment” (lazy stream of bounded length) of low level messages, where a “message” is any value with a size (natural number with abstract meaning, not necessarily number of bytes) where that size is accessible by the codec in constant time


notjack
2017-12-1 20:46:21

the most primitive message type would be a bytestring, with codecs that use them as a low-level message type being codecs that emit chunks of bytes


notjack
2017-12-1 20:47:29

the “segment” idea is to allow codecs to do things in a lazily streaming parser-combinator manner but with more guarantees on when you can and can’t commit to a parse


notjack
2017-12-1 20:48:28

I think the haskell trifecta library does this a bit with it’s notion of parsers producing and consuming “ropes” of values but honestly I can’t figure out much of anything from it’s docs (blasted haskellers, thinking types are good enough docs for anybody)


notjack
2017-12-1 20:49:14

so codec might end up looking like this maybe:


notjack
2017-12-1 20:56:22

actually I have no idea how it would look right now


notjack
2017-12-1 20:56:35

hm


notjack
2017-12-1 20:58:02

(also I gotta run to meeting)


dedbox
2017-12-1 20:58:15

K


dedbox
2017-12-1 21:13:39

That trifecta doc is pretty dense :sweat_smile:


dedbox
2017-12-1 21:16:31

But I think I get what’s going on with the sizes


notjack
2017-12-1 21:58:43

another thought: the “high-level vs low-level message” thing is basically what lexing before parsing does


notjack
2017-12-1 21:58:58

so composing codecs would be like nesting multiple kinds of lexing


dedbox
2017-12-1 22:00:41

ok, that’s an angle I can understand


notjack
2017-12-1 22:00:52

e.g. bytes -> simple tokens -> more complex tokens -> parsed values


notjack
2017-12-1 22:03:48

so maybe the codec interface should work in a way where you could stick a parser combinator library inside the parsing logic of a single message, but with the codec having extra logic around the parser that knows how to bound the amount of low level messages to read so the parser is applied to finite input of known size


notjack
2017-12-1 22:12:26

oh! maybe this!

-- start a message read attempt with the following logic:
--   1. take a max number of elements to read and a stream
--   2. reads small number of elements in order to decide on a "segment termination predicate"
--   3. caller can now skip the elements consumed to produce the predicate
--   4. then caller consumes stream elements and hands off to parser combinator until termination predicate is true or until caller decides too many elements
startRead :: Stream s => Nat -> s a -> (Nat, Predicate a)

dedbox
2017-12-1 22:16:03

ok


dedbox
2017-12-1 22:21:08

To decode an HTTP message start line, I need to read from the current input position to the next line terminator.


notjack
2017-12-1 22:21:28

right


dedbox
2017-12-1 22:22:19

So I might have a line decoder that I can instruct to carve itself up with parser combinators and other codecs?


notjack
2017-12-1 22:22:32

hmmm


notjack
2017-12-1 22:22:39

I think it might be more like a tokenization pass


notjack
2017-12-1 22:23:15

bytes -> method-token + whitespace-token + target-url-token + whitespace-token + newline-token


notjack
2017-12-1 22:23:43

because for each of those token types implementations are allowed to attach finite length limits, and the spec mandates minimum length support


dedbox
2017-12-1 22:26:13

This makes me want decoders to be able to use other decoders like an input port.


notjack
2017-12-1 22:26:38

I think that makes sense


dedbox
2017-12-1 22:27:31

So we can build messages as a series of progressively more abstract tokens.


notjack
2017-12-1 22:27:40

yes, that’s definitely what I want


notjack
2017-12-1 22:28:08

also


notjack
2017-12-1 22:28:28

I think it might help to think of the underlying bytes input and output as a stream of byte-chunk tokens


notjack
2017-12-1 22:28:52

that would let codecs figure out segmentation / buffering / MTU logic


dedbox
2017-12-1 22:33:00

so what level of detail is appropriate for tokens? A TCP packet orEthernet frames? A utf8-encoded character? a 2GB binary blob? All of the above?


notjack
2017-12-1 22:33:49

all of the above - a token would be the “message type” I described earlier


notjack
2017-12-1 22:34:02

it’s just a value with a size of some sort


dedbox
2017-12-1 22:35:17

ok, so “messages” are like tokens, and codecs/messengers are like lexers. Then protocols are like parsers.


dedbox
2017-12-1 22:35:40

in that they produce and consume streams of tokens


notjack
2017-12-1 22:36:35

hmm


notjack
2017-12-1 22:36:39

not sure


notjack
2017-12-1 22:36:49

alright here’s a simple and pretty practical example


notjack
2017-12-1 22:37:05

json-rpc is a protocol for RPC via sending and receiving json values


notjack
2017-12-1 22:37:52

it’s not too low level while having mostly simple parsing/lexing


notjack
2017-12-1 22:42:48

Client sends either requests, notifications, or batches containing a mix of requests and notifications. Server sends response for each request, not necessarily in a batch (multiple responses for a single batch request), and sends nothing for notifications.


notjack
2017-12-1 22:43:10

so there’s the following types involved:


notjack
2017-12-1 22:49:34
  • JsonRpc = Request \| Response \| Notification \| Batch \| Error
  • Json = ... structured representation of json type ...
  • JsonToken = LeftBracket \| RightBracket \| String \| Number \| Null \| Undefined \| Boolean \| Whitespace \| Newline \| Colon

notjack
2017-12-1 22:51:04

you’d start with a Codec JsonToken ByteString - it implements the basic json token serialization and deserialization logic between one JsonToken and possibly many chunks of bytes (as would happen with a very large string value)


notjack
2017-12-1 22:51:43

even though this sounds like lexing, it has parsing elements to it because correctly reading a json number can involve some pretty wild stuff


notjack
2017-12-1 22:52:19

but with that codec you’d have to tell it which kind of token to read, it wouldn’t try to figure it out from the bytes it’s given


notjack
2017-12-1 22:52:30

so it’s not like a parser which just figures out what it’s given


notjack
2017-12-1 22:53:08

(or maybe that would be better expressed with separate codecs for each of the token types, with all codecs having the same Codec JsonToken ByteString type)


notjack
2017-12-1 22:53:17

(I’m shooting from the hip at the moment)


notjack
2017-12-1 22:53:19

anyway


notjack
2017-12-1 22:53:58

then you do the same thing at one level higher: make a bunch of codecs of type Codec Json JsonToken


notjack
2017-12-1 22:54:06

then do again for Codec JsonRpc Json


notjack
2017-12-1 22:55:21

since at each of those levels, sending and receiving one value of the left type involves sending and receiving multiple values of the right type - and you can give a size to the left type value that tells you a bound on how many values of the right type you’ll send or receive


dedbox
2017-12-1 22:56:43

So, one codec “extracts” JsonTokens from ByteStrings without necessarily examining every byte. Another “extracts” Json objects from JsonTokens. Another “extracts” JsonRpc requests from Json objects.


notjack
2017-12-1 22:56:57

Yes


dedbox
2017-12-1 22:57:20

And the protocol spec says which bytes to examine.


notjack
2017-12-1 22:57:29

also yes


notjack
2017-12-1 22:58:00

in that there’s a spec saying how to serialize json values to Unicode, then there’s UTF–8 for turning Unicode to bytes


notjack
2017-12-1 22:58:23

(maybe there should be a UnicodeStr type or something in between the JsonToken and ByteString types)


dedbox
2017-12-1 22:59:10

Then a protocol is a set of codecs along with a set of rules on how and when to use them.


notjack
2017-12-1 22:59:32

I think the parser combinator way to do this would be to use monadic bind to implement the chaining from simple types into complex types


notjack
2017-12-1 23:00:05

but not doing it monadically might mean the codecs can have more control over how to combine a high level codec with a low level one


notjack
2017-12-1 23:01:20

and yes, the JsonRpc protocol (version 2.0) says stuff like “one of you is the client and one is the server. only the client initiates communication. only the server sends Response or Error objects. only the client sends Request or Notification objects. either party may send Batch objects”


notjack
2017-12-1 23:01:48

that sort of logic absolutely shouldn’t go inside codecs or a parser, because it’s so incredibly ad-hoc from protocol to protocol that any common framework for it will get really, really complicated


notjack
2017-12-1 23:02:15

codec encapsulates the pure logic of serialization and deserialization


notjack
2017-12-1 23:02:52

messenger encapsulates the logic of actually using a codec over a transport (that can be very nontrivial and involve things like stream multiplexing)


notjack
2017-12-1 23:03:30

net2 abdicates responsibility of actually sending and receiving the correct messages at the correct times, since that’s protocol specific and can’t be part of net2’s generic framework


notjack
2017-12-1 23:04:05

but it can provide building blocks for common patterns, like a client that only lets you send messages and wait for response messages - it wouldn’t let you read responses at any random time


dedbox
2017-12-1 23:07:14

I just stumbled onto a blog about HTTP/2 using words like frame and stream multiplexing. I should probably read the HTTP2 RFC.


notjack
2017-12-1 23:07:45

there’s a very good overview blog post by one of the spec authors


notjack
2017-12-1 23:07:47

lemme link you


notjack
2017-12-1 23:09:46

hmmm maybe I’m misremembering an amalgamation of articles



notjack
2017-12-1 23:12:08

dedbox
2017-12-1 23:12:20

cool, thanks