@james.mcdonell has joined the channel
s-exp->fasl
tells me that it can’t write a set. Does anyone know why? Or how to work around it to make it save a set?
@bkovitz it’s because a set is not “a value that could be quoted as a literal—that is, a value without syntax objects for which (compile `’,v) would work and be readable after write. ”
So, is there a way to make a set into a value that could be quoted as a literal? IOW, how can you fast-save/load a set?
Turn it into a list would be my first go-to
(see set->list
)
Hmm. OK. I’ve got some big struct that contains hashes, sets, etc. There’s no way to say “just fasl this thing”?
(and also list->set
)
off the top of my head I would say if you want to rely on something like fasl
, you’ll need a pass that serializes complicated data structures into simpler data that is uniquely identifiable and can be directly tossed to fasl
and then of course you would need a pass that does the conversion in the other way, after calling fasl->s-exp
OK, thanks. I’m in the middle of something else right now, but I’ll dig further into how that could work later.
(prefab structs might be a nice way to tag certain lists as having been sets or other unique data structures)
someone else might have a better/simpler way
Ah, that could be a clue. So far, I marked the struct as prefab, but that’s all.
@bkovitz The output of serialize
from racket/serialize
can be used as input to s-exp->fasl
: http://docs.racket-lang.org/reference/serialization.html?q=serialize#%28def._%28%28lib._racket%2Fprivate%2Fserialize..rkt%29._serialize%29%29
@philip.mcgrath Thanks! That worked—for writing. I can write the data to file by calling write
, but I can’t read it back by calling read
. Do you know which reading function reads data back in from a file so it can be passed to fasl->s-exp?
@bkovitz Is your write
output valid? write
should: “Writes datum to out, normally in such a way that instances of core datatypes can be read back in.“
Here’s what I did:
sounds like you would probably benefit from just using read/write for now
Oops, not quite that. Waiting on DrRacket…
Here’s the write
statement:
I’m also not sure about serialize and fasl and write all in one. That seems prone to problems, no?
I have no idea. I figure there’s some easy way to save data to a file and re-load it.
what’s wrong with (write medium-slipnet (open-output-file ...))
?
I’ll try that right now…
and a plain read
to deserialize
the doco has sections for Reader Extension and Writer Extension if you have custom types that choke
Here’s what I got:
It’s complaining about #<set: (slipnet archetypes)>)
.
yeah. you might need a writer/reader extension that know to set->list
and list->set
for your slipnet
assuming that’s fairly top-level… it’s pretty easy.
Do serialize and fasl do this already?
Yeah, I can deserialize and de-fasl with no problem (so far). The only hitch is reading the data from a file.
How can you just say “slurp in this entire file”?
I think I just found it…
s-exp->fasl
and fasl->s-exp
already save and restore to a file! You can just pass the file as an argument.
@bkovitz Right now you are creating a byte string (as in bytes?
) and then writing that to a file. You can do this, but it is probably not what you want: you probably want to write the fasl binary format directly to the file.
Hmm, reading it back in is faster than regenerating the data structure from scratch, but still pretty slow.
Yes—I was making an example, but you figured it out! The fasl functions (now) support reading to and writing from either ports or byte strings.
Yep, got it! The fasl functions’ args definitely provide a convenient set-up.
How large is the fasl representation? Hard to say if the time is in disk IO or in deserialize
etc.
The big-slipnet
file is 50 megabytes. Probably the next thing I need to look at is how to make this data structure smaller.
That would be the best thing.
If you think it’s IO-bound, you could also see if compression would help.
The only reason I’m saving the data to disk is to avoid regenerating it every time I run the program. It’s a graph with 123,000 nodes and 300,000 edges. That’s fairly large, but I’ll bet there’s a way to make it in much less than 30 seconds.
I’ve been profiling the code that makes the graph, and so far nothing stands out as the leading CPU-hog.
Here are a couple of the leaders. Nearly everything else only takes up small percentages of the total CPU time. This makes me think that if I want it to go fast, I’ll need to convert it to Typed Racket.
Is there any easy way to find out what parts of the data structure are taking up the most memory? Maybe just using memory more efficiently would also make it faster to create.
Typed Racket is unlikely to be a major performance win unless you’re doing lots of floating point arithmetic