dmitryhertz
2019-11-1 15:34:17

Every kind of web dev :slightly_smiling_face:


sorawee
2019-11-2 02:12:54

I have an XML document that encodes a newline with 
. It seems to me that read-xml completely discards it. Is there an alternative to read-xml that reads these sequence correctly?


samth
2019-11-2 02:52:29

Maybe the sxml libraries?


sorawee
2019-11-2 02:53:27

:disappointed: I’ve already written my code expecting xexpr. sxml would work, but I need to convert it to the format…


samdphillips
2019-11-2 04:03:14

Ok this is slightly gross and maybe overkill, but you can try re-encoding the stream before handing it off to read-xml

https://gist.github.com/samdphillips/48a87ada3587a32654fab1d607155e99

Also it looks like they are in the xexpr as (entity _ _ 10) (or 13) without re-encoding.


sorawee
2019-11-2 04:06:31

Oh wow, thanks so much! Can you clarify what you mean when you said “they are in the xexpr without … without re-encoding”? I really don’t see anything.



sorawee
2019-11-2 04:07:25

It should not just skip the entity.


sorawee
2019-11-2 04:07:41

I’m actually drafting a PR to fix this right now


samdphillips
2019-11-2 04:09:12

In the test I did I got entitys for those. > (call-with-input-string "<fake>
</fake>" read-xml) (document (prolog '() #f '()) (element (location 1 0 1) (location 1 18 19) 'fake '() (list (entity (location 1 6 7) (location 1 11 12) 10))) '())


samdphillips
2019-11-2 04:10:08

(entity (location _ _ _) (location _ _ _) 10) on the third line of output.


sorawee
2019-11-2 04:11:15

Interesting. I see that too. Could you try the program in https://github.com/racket/racket/issues/2885 ?


samdphillips
2019-11-2 04:13:00

Oh I bet it’s because they are in an attribute.


samdphillips
2019-11-2 04:15:27

Yeah it is dropped.


samdphillips
2019-11-2 04:17:22

The contract on the value field of attribute would need to be changed.


samdphillips
2019-11-2 04:23:19

The re-encoding trick only works outside of CDATA blocks though FYI.


samdphillips
2019-11-2 04:46:39