ben.knoble
2022-6-1 07:41:41

The implementation online uses an idiom which is guaranteed to be at least n^2 when it does (append xs (list x)) repeatedly (see “Shlemiel’s painting algorithm” or whatever it’s called). There are other things mentioned in the posts you linked, but that’s one I know is going to be slow.



sw5355700
2022-6-1 19:15:09

btw everyone at my trailer park loves racket. im just a sort of like a black sheep preferring scheme.


sw5355700
2022-6-1 19:16:36

i live next to an old guy who’s going to show me a way of “turning a continuation inside-out” in racket but, he’s been too busy fighting the governement for his disability check to show me.



sw5355700
2022-6-1 22:49:42

the most significant factor in why this parser is slow is because html attributes are optional and that ambiguity in syntax causes the parser to take many routes while parsing.


sw5355700
2022-6-1 22:50:14

if the syntax lacks ambiguity gll is actually reasonably fast.


sw5355700
2022-6-1 22:51:36

actually these results were with gll 2 (memoized closures). gll 3 (dynamic dispatch) is actually slower when parsing stuff like this.


sw5355700
2022-6-1 22:51:55

Chez Scheme Version 9.5.7.7 Copyright 1984-2021 Cisco Systems, Inc. > (import (html)) > (time (compile '(call-with-input-file "2022-05-23-compiler-books.html" read))) (time (compile (quote (...)))) 194 collections 289.100344542s elapsed cpu time, including 0.368379000s collecting 289.725016000s elapsed real time, including 0.369518000s collecting 1631197648 bytes allocated, including 1298219808 bytes reclaimed ...)


sw5355700
2022-6-1 22:52:03

Chez Scheme Version 9.5.7.7 Copyright 1984-2021 Cisco Systems, Inc. > (import (html)) > (time (call-with-input-file "2022-05-23-compiler-books.html" read)) (time (call-with-input-file "2022-05-23-compiler-books.html" ...)) 194 collections 295.305088793s elapsed cpu time, including 0.358148000s collecting 295.552441000s elapsed real time, including 0.358937000s collecting 1631175584 bytes allocated, including 1298236720 bytes reclaimed ...)


sw5355700
2022-6-1 22:52:14

Welcome to Racket v8.5 [cs]. racket@> (enter! "3/parser.rkt") racket@3/parser> (time (stream->list (call-with-input-file "2022-05-23-compiler-books.html" (lambda (in) (read (read-string 1000 in)))))) cpu time: 787794 real time: 803644 gc time: 2323 (list (success '(html ((lang . "en")) (body () (article () ((h1 ((id . "perspective")) "Perspective") (p () "These books were meant to be used in undergraduate courses and, serve as supporting material for a first course in compilers. So, how well do these books do this? They're both reasonable at it. They present common techniques of compiling languages well. Especially the tiger book which has a second part going over general concepts of less common compilation techniques. However, they both pedagogically lack perspective of computer science as a whole. They both present a compiler in a single way and, don't give students enough common vocabulary to do or study research on compilers."))))) ""))


sw5355700
2022-6-1 22:52:22

Welcome to Racket v8.5 [cs]. racket@> (enter! "3/parser.rkt") racket@3/parser> (time (compile (stream->list (call-with-input-file "2022-05-23-compiler-books.html" (lambda (in) (read (read-string 1000 in))))))) cpu time: 775028 real time: 791194 gc time: 2355 write: cannot marshal value that is embedded in compiled code value: (success '(html ((lang . "en")) (body () (article () ((h1 ((id . "perspective")) "Perspective") (p () "These books were meant to be used in undergraduate courses and, serve as supporting material for a first course in compilers. So, how well do these books do this? They're both reasonable at it. They present common techniques of compiling languages well. Especially the tiger book which has a second part going over general concepts of less common compilation techniques. However, they both pedagogically lack perspective of computer science as a whole. They both present a compiler in a single way and, don't give students enough common vocabulary to do or study research on compilers."))))) "")


sw5355700
2022-6-1 22:53:13

i ran this in racket as well and, it’s over twice as slow to parse the same text compared to chez.


sw5355700
2022-6-2 01:56:54

i just took a look at where the ambiguity was. it was because the parser for the end tag could also be parsed as a start tag.


sw5355700
2022-6-2 01:57:15

(time (compile (quote (...)))) 16 collections 0.978884209s elapsed cpu time, including 0.047950000s collecting 0.979210000s elapsed real time, including 0.047999000s collecting 132104256 bytes allocated, including 38808592 bytes reclaimed