raymond.machira
2020-2-6 18:58:12

Hey team, I am writing a dsl, and have some questions. I am trying to parse the following log lines using a custom dsl. Object{fieldOne=foo, fieldTwo=bar, fieldThree=baz buzz} Object{fieldOne=some, fieldTwo=value, fieldThree=other value}

The number of fields is variable, but they all sort of have that structure. The output would be json, so the above would be: [{"fieldOne":"foo", "fieldTwo":"bar", "fieldThree":"baz buzz"}, {"fieldOne":"some", "fieldTwo":"value", "fieldThree":"other value"} ] I am particularly struggling with the tokenizer. Here is what I have: (define (make-tokenizer port) (define (next-token) (define toStringer-lexer (lexer [(for/list [attr (regexp-match* #px"([\\w]+)=([^,]+)" (in-lines port) #:match-select cdr)] 'ATTR attr)] [any-char (next-token)])) (toStringer-lexer port)) next-token) (provide make-tokenizer) The idea is that I want to grab the fields into a list of tokens, ie:

'('(fieldOne foo), '(fieldTwo bar), '(fieldThree baz buzz))

A couple of questions, 1) what do you think of this approach? Is there a better way than the regex? 2) The lexer above won’t accept the regex,

Thank you for reading and helping my learning.


soegaard2
2020-2-6 19:30:18

Hi @raymond.machira I think you should move your question to #general.

You write: (define toStringer-lexer (lexer Where does lexer come from? Is it from parser-tools/lex ?


raymond.machira
2020-2-6 19:30:59

It comes from (require brag/support)


soegaard2
2020-2-6 19:36:42

In that case I think the problem is that the brag uses the lexer from br-parser-tools/lex which is a variation of parser-tools/lex. What’s easy to miss, is that the regular expressions used by the lexer generator aren’t the normal Racket regular expressions (that begins with #rx and #px).



soegaard2
2020-2-6 19:38:03

Also in brag it seems the names to build regular expressions are prefixed with a colon : . See https://docs.racket-lang.org/brag/index.html?q=#%28form._%28%28lib._brag%2Fsupport..rkt%29._~3a%2A%29%29


raymond.machira
2020-2-6 19:40:52

Ooh great! Thank you for pointing these out. I will take a look. What do you think of this strategy for fixing>?


soegaard2
2020-2-6 20:20:30

Btw - you can test the lexer alone.


soegaard2
2020-2-6 20:20:42

(define toStringer-lexer (lexer [(for/list [attr (regexp-match* #px"([\\w]+)=([^,]+)" (in-lines port) #:match-select cdr)] 'ATTR attr)] [any-char (next-token)]))


soegaard2
2020-2-6 20:21:26

Then use it on a string port, like: (toString-lexer (open-input-string "some real data here"))