@alexharsanyi I’m giving your nice data-frame
package a try. Thanks for writing and kindly providing it to everyone.
(*) CSV reading and RFC 4180
I think the procedure df-read/csv
does not satisfy RFC 4180. It seems to the expect that every CSV record is one a line by itself — that is, it doesn’t allow a quoted newline. I got around that by using the csv-reading
package.
(*) Looking up in data frames, the need for sorting a series
I’m thinking I didn’t understand how to use df-lookup
. It demands a sorted series, right? But the library didn’t provide any sorting mechanism for the data frame. If I sort a series using sort
, how will the data frame know that I’ve changed rows out of order? The way I think of a data frame is just like a table — the fact that the data structure stores the data in columns just seems to me a matter of implementation. If I sort a column, I should sort all others too. I could get the data out of the data-frame, sort it properly and then place it back, but that tells me I probably didn’t get something.
@alexharsanyi has joined the channel
Wow. I clicked on the button “invite” the person, but didn’t know this would force-join the person right away. Sorry about this violent move! Didn’t mean to!
The bot actually says it just invited the person. :slightly_smiling_face:
It is possible that df-read/csv
does not fully conform to RFC 4180 and to many other weird CSV variants out there — so far it has worked fine for the datasets I need, and I add features as I need them.
with regards to df-lookup
— the data series are expected to be already sorted, and the user just needs to tell the data frame object that a series is sorted using df-set-sorted!
, there are no facilities for sorting data, because the data sets I had to deal with were already sorted. For looking up rows based on unsorted series, you can use df-select
, or df-select*
with a #:filter
to select just the rows you are looking for (this will search the series in linear time).