
hello. i have question - what is faster for extract data - hash, vector, structure?

Vectors and structures ought to have faster access times than a hash.

thanks


Here’s the situation - I have a lot of small data that are created only once, the data itself does not change in the process,but only generates new ones. I’m looking for a way to efficiently present them for fast data reading

How much data, and how long is it used for?

it is list of tokens + some data, used for analize

time for live for data - full time work of app

thousands of items, millions, billions, or more?

a rough guess is fine

thousands

Quick rule of thumb for how to pick which data structure to use, based on how many items you’re working with:
Thousands: it probably doesn’t matter what you pick, don’t worry about it Millions: it probably matters, but you need benchmarks and to know what access patterns you care about Billions: don’t even try to keep it in memory in a single huge data structure, it will end in tears More: this is the least of your worries

thanks

happy to help :simple_smile:


A few thoughts: 1. Do all the places do the same thing? If not, maybe you can have some of them not load plot
. 2. You could load some libraries lazily, using lazy-require
. 3. Running on RacketCS, provided Herbie works there, might have less per-place overhead. (If Herbie doesn’t work on RacketCS, please report a bug!) 4. RacketCS also has better-implemented memory backtraces. 5. You could investigate which of the libraries are contributing to the memory the most, and see if your use of them is necessary or could be slimmed-down. For example, maybe you can use plot/no-gui
.

I’m trying to make it so when I add a subcommand to my CLI, I automatically get relevant usage help strings. Last time, I did this:
(module+ main
(void (command-line
#:program (short-program+command-name)
#:once-each
#:args (action . _)
(parameterize ([current-command-line-arguments (get-subcommand-args action)])
((hash-ref action-table action (λ _ show-subcommands)))))))
I didn’t like it because I must manually update help info here when I add a subcommand, and I have to repeat the whole pattern every time I want a nested subcommand. I’m now thinking of trying a Git-style convention that translates command lines into program names, and generating one racket launcher + help given a tree of modules to match the program names.
But before I go do all that, what does raco
do to handle this? Is it easy to replicate?

The implementation for raco
is in collects/raco/raco.rkt
. I think planet
had a implementation closer to being general, but I’m not sure; see collects/planet/private/command.rkt
. And then there’s collects/pkg/main.rkt
, but that’s different and aimed more at the problem of dealing with rough commonality among different subcommands. Maybe there was another effort to generalize (maybe from planet
), but I don’t remember that clearly.

Thanks much.

There’s a very minimal package called command-tree
. I haven’t used it, though. I’ve rolled my own a few times now, and I agree it would be nice to have a general solution that just worked.

I took a look at that, and it’s not what I’m looking for. I’ll take a crack at this.

@yilin.wei10 has joined the channel

For convenience of reference: planet
uses SVN conventions and introduces another wrapper that has a command-line
-like form. The one example that stood out is pretty involved, but that’s neither here nor there. It supports my preconception that conventions have a huge design impact.

Hi all, I’m trying do some of Okasaki’s functional data structures to play around with typed racket, but I’m having trouble since some of the data structures rely on some form of ad hoc polymorphism. What would be the best way to go about this in typed racket?

So far I’m just passing through the relation in the function as a first parameter, but it feels a little clunky.

Each worker runs an instance of Herbie. That pulls in a lot of stuff.

We do use plot/no-gui
. I haven’t tried RacketCS but the lack of single-floats is going to hurt. We have a (vague) plan for eliminating the dependency but it’ll take some work.

Lazy-require and not loading plot
won’t really work, sadly. Every worker does the same thing and that thing involves plotting.

This seems weird. First noticed in my local documentation but it also appears that way at https://docs.racket-lang.org/gui/Windowing_Classes.html

The fact that single floats are useful is interesting, when @mflatt changed them for 7.4 our impression was they were mostly unused.

Is each worker doing a mix of heavy compute and plotting? You could split those tasks across different pools of workers and hand off the computed data to the rendering worker pool. Then you could scale them separately and avoid paying the price of loading plot
so many times.

Well, Herbie naturally cares about the specifics of floating-point, given that it’s about floating point accuracy :slightly_smiling_face:

@notjack The issue is that the data that we plot involves some non-serializable pieces.

(Not the plotting itself of course, but some of the other stuff that worker does, but separating out just plotting would involve a very large-scale rearchitecture

Looks like it is the documentation for the event%
class https://docs.racket-lang.org/gui/.html

I’m no Scribble expert, maybe it was changed on Apr 11? @defclass/title
was removed. https://github.com/racket/gui/commit/7c30526f3b900ef65ecc2bdea207156e366ed298#diff-68750f1d4439a7f67a2dc77bcd3ad47b

I’ve pushed a repair

@pavpanchekha what sort of non-serializable pieces?

(To clarify I’m not talking about how to replicate typeclasses in typed racket, simply how it would be best to code the solutions)

Tracebacks or whatever Racket calls them, and also a data structure we use called an alt
in which the structure of the pointer graph matters

The second might be doable using graph-printing, somehow. We used to have a custom serializer for the first, but it was way too much code to have around for no benefit

Can you say more about the use of single floats? Are you using them because you want to analyze the accuracy of computation that uses them, or because they help in the internals of your code?

Also, why do you want to plot continuation-mark-set values? And could you just use continuation-mark-set->context?


Thanks

For the first, we interpret that uses single-floats, and use single-floats in the interpreter in the obvious way

For the second, we want to generate nice HTML error traces

This means 1) generating an exception value somehow; 2) passing it to the subroutine that generates the HTML, including plotting; 3) extracting the context and generating the HTML backtrace

In (3) we use continuation-mark-set->context

But the boundary between places would be at (2)

could you move that use of continuation-mark-set->context
to (2)?

So we’d have to mangle exception values before passing them over, which is a pain

Yeah, that’s what we used to do

I guess I don’t see why that’s a pain — is there something else about the exception value that is useful?

We can pass it to the default exception printer, for example, if we want a console stack trace instead of an HTML one

that makes sense, although it seems like there ought to be a way to make this work for you

I guess you’re suggesting unpacking the exception value into a type and a message and a traceback, and passing around those pieces, all of which are serializable

or just do the printing and then pass around the string and the stack trace

But we can’t re-pack it, so we’d need to duplicate some Racket code, and all of this is also to create the much more complex architecture where plotting uses one set of workers and everything else uses another

The better decision seems to be to write our own plotting library, which does the plotting in JS

that drops the plot library entirely, and it helps with both memory and possible other goals like interactivity

Plus owning the plotting library seems to make much more sense for Herbie than owning a little exceptions library

(Since our custom plotting library can specialized to our needs and anyway is very user-facing)

I feel like we ought to be able to come up with a solution that involves neither of those

Well, part of the issue here is that we use places for parallelism

sure, but that should be a good thing

Is it? Serializability is irrelevant in our application (one process)

And in one process in particular I don’t see why we need to load multiple instances of each library

Except of course that ATM dynamic variables (parameters) and so on live inside the library instance

As I understand it

right, all the values of the libraries definition are part of the library instance, and the point of places is that they’re shared-nothing

Yep

Shared-nothing gives good guarantees but bad memory usage

Far as I know, there isn’t a “share-everything” option in Racket

In any case, client-side plotting is planned anyway (for interactivity and controlling styling), so I guess I’m just going to bump the priority of that

In the hopes of controlling memory usage

you can use futures, but those are more limited in terms of parallelism (because sharing a bunch of mutable data is hard)

what’s the best way for me to try herbie and see the memory consumption?

raco pkg install herbie
will install the tool, but the benchmarks will be hard to find, so I recommend git clone <https://github.com/uwplse/herbie>

bThen you just do racket src/herbie.rkt report bench/hamming/ /tmp/out/

On my machine that is on the order of 400MB

I tried futures, but I think even allocations break parallelism there, so it didn’t help

allocations will cause occasional synchronization but not break parallelism; again things are better in racketcs

Sorry, yeah, when I said “break parallelism” I mean they cause synchronization. Herbie does a lot of allocation. When I tried futures I failed to find any speed-up at all

yes, that isn’t that surprising; futures can often fail to scale well

If I want to make a POST request, what library is the easiest one to use? There’re at least simple-http
, racket-request
, http
, net/url
, and net/http-client
.

I like http-sendrecv
from net/http-client


I can’t remember whether I spent much time comparing though.

That’s very helpful. What I’m really looking for, I guess, are some examples, and you give me exactly that.

Oh… so #:data
is only for POST and not GET…

Sounds right.

Don’t use the request
package, I don’t actively maintain it anymore and it’s got some weirdness.

Does this look right to you, @soegaard2?
(match method
['get (http-sendrecv
api-url
(~a "/w/api.php?" (alist->form-urlencoded info))
#:ssl? #t
#:version "1.1")]
['post
(http-sendrecv
api-url
"/w/api.php"
#:ssl? #t
#:version "1.1"
#:method "POST"
#:data (alist->form-urlencoded info)
#:headers
'("Content-Type: application/x-www-form-urlencoded"))])

yes

Slightly disappointed that info
can’t be shared in both methods

In command-line
, some clauses like #:ps
call for string literals and not string expressions.
Here’s some incorrect code. I don’t think I can use begin
b/c I am not meaning to use implicit top-level forms. That, and I think I should be operating at a higher phase level. How do I change this such that the syntax object expands to multiple string literals?
#lang racket/base
(require racket/cmdline (for-syntax racket/base))
(define-syntax (place-lines stx)
(datum->syntax stx (begin "A" "B" "C")))
(command-line #:ps (place-lines))

Here command-line
is a macro, so (command-line #:ps (place-lines))
expands from the outside in. That is, the syntax transformer of command-line
recieves #'(command-line #:ps (place-lines))
.
You need to use with-syntax
to insert “arguments” to a macro call.

Something like (untested) (define-syntax (my-command-line stx)
(syntax-parse stx
[(_my-command-line arg ...)
(syntax/loc stx
(with-syntax ([(line ...) #'("A" "B" "C")])
(command-line arg ... #:ps line ...)))]))
and then use it as: (my-command-line)
.

Gotcha, thanks. I was honestly a little afraid of hiding command-line
due to its complexity. For some reason I figured it was possible to have higher-phase macros expand before lower-phase macros.

Above my-command-line
and command-line
have the same phase.

I think the order of the arguments of command-line
might be important. So putting line ...
at the end might not be what you want.

That and I was hoping to avoid the work of preventing the user from (implicitly) specifying #:ps
twice.

> the much more complex architecture where plotting uses one set of workers and everything else uses another
Would that be more complex, or just different? Making task-specific worker pools seems simpler to me than using a single shared pool. I’m imagining a macro like:
(define-task (compute-foo x y z)
... regular function body ...)
…which under the hood creates a pool of places, binds compute-foo
to a function that sends its inputs over a place channel and receives the output, and generates an implementation with the body of compute-foo
that reads inputs from a place channel and sends the output back.

Could also allow require statements inside the body to make the dependencies of the worker places explicit: (define-task (compute-foo x y z)
(require x y z)
... body ...)

@notjack Maybe so, but the architecture we are currently using (place per task) already exists, so there’s a lot of rewriting involved to change it

Feedback request: Here’s a short demo of a prototype for quickly writing CLIs with subcommands + summary docs.

TL;DW: It builds directories that look like this
delta.rkt
delta_build.rkt
delta_build_once.rkt
delta_build_live.rkt
delta_query.rkt
delta_version.rkt
That you can use like this: $ delta
$ delta build
$ delta build once ...
$ delta build live ...
$ delta query ...
$ delta version ...

How can I get cookies to work with net/http-client
?

Is there something equivalent to requests.Session
in Python?


Ughh.. this is kinda inconvenient…

FWIW - Manual session cookies: https://github.com/soegaard/racket-stories/blob/master/app-racket-stories/control.rkt#L57

Also Bogdan’s koyo has sessions: https://github.com/Bogdanp/koyo/blob/master/koyo-doc/scribblings/session.scrbl

Thanks. Really appreciate your help

Part of the reason I wrote racket-stories was to get an actual example of using Racket for web apps. It’s difficult to find public examples.

I wonder why Bogdan’s koyo doesn’t show up on http://docs.racket-lang.org\|docs.racket-lang.org ?

I think these are primarily for server side, though, right?

My work is mostly client side. Just making a bunch of requests to servers

Ah I thought you needed server side sessions.

I think @popa.bogdanp hasn’t put it there

Ah, yeah that makes sense

Does that mean that if there existed a good library that made it easy to set up pools per task type, and if you had used it, that would have prevented the performance problems your architecture has now?

Asking because I’ve thought about making such a library but haven’t prioritized it

Skimmed the video, I think this is an interesting approach.

Cool, it works now.

Last question: if I have a (in-generator
) sequence that I want to cleanup after it completely enumerates the sequence, how should I do that? I know there’s dynamic-wind. Is that what it’s for?
EDIT: I think dynamic-wind
wouldn’t do it, since it would log me out before I completely finish API queries

net/cookies
is the way I’ve used cookies, but I don’t have an example with raw net/http-client

Implicitly composing things based on directory structure often makes for sharp API edges. What do you think about making the wiring explicit with a module that says “this command group is made of these commands”?

You sort of… can’t. The sequence protocol doesn’t have a place to attach cleanup logic.

IME using dynamic-wind
with sequences ends sadly.

OK, so using sequence-append
with the cleanup code in the second sequence?

I’m trying to improve the state of things here with transducers. Would you be willing to share more details about your use case?

In the case of in-generator
if you know it will always run to the end you could probably do: (in-generator () (acquire-resources) ... (yield a bunch) ... (release-resources))

But if you don’t use the whole sequence I probably won’t release.

It won’t run if something doesn’t consume the whole sequence

What cleanup are you doing?

I’m querying a lot of information from an API. In the beginning, I need to login. At the end, I need to logout.

Since there are a lot of information, it needs to be separated into chunks. Hence generator.

You could use dynamic-wind
but you need to attach it around the whole usage of the sequence.

Would shelling out to curl
work for your use case? It handles redirects, cookies, etc.

Like if the sequence escapes the dynamic scope of the dynamic-wind
the resources will be released and then error.

yup, that’s not what I want

But if you use it all in the body it will be OK.

You might be able to use one of the weak memory management things if you don’t need to release immediately.

Would this work? I’m slightly worried about yield-from
since it looks like it needs to hold a lot of data in memory
(define (foo) (in-generator ...))
(define (bar)
(login!)
(in-generator
(yield-from (foo)
(logout!)))

where yield-from
is
(for ([x xs]) (yield x))

@samdphillips you meant Will & Executor thing?

I used it in my previous project. It doesn’t really work… I probably did something wrong

How well does DrRacket work on Linux? I’ve been developing on a Mac for 11 years, but every single one of my friends who have upgraded to a recent Macbook Pro have had their computers in the shop at least once, and often two or three times. I mostly live in Emacs all day, but I am fond of DrRacket, so I thought I’d check before I do something crazy like try Linux as my dev machine again :slightly_smiling_face:

Wills or ephemerons. But I don’t know what the best practice is for that.

You need to log out? Why not just let it expire?

@badkins works just as well on linux as on mac

I’ve used linux for 20 years

including DrRacket

Awesome - nice to have options!

Maybe the new 16" model with scissor keys won’t have so many reliability issues, but hard to say.

My account kinda has a lot of privileges. So as a safety measure, I want to avoid the risk of it being compromised as much as possible.

I wanna know how far can I go with Racket. Otherwise, I would just use Python’s requests
.

On another note, it’s now been over a year since I switched to Racket as my primary language, and I’m more pleased with it now than a year ago when I switched - it’s been the most fun I’ve had in over three decades of programming - so thanks Racket folks!

Here’s another hacky way:
#lang racket
(require racket/generator)
(define (logout-after! xs)
(sequence-append
xs
(in-generator (displayln "logging out..."))))
(define xs
(logout-after!
(in-generator
(yield 1))))
(displayln "logging in")
(for/list ([x xs])
x)

I’ve never worked with a web API that required logout. In most cases it just invalidates the credentials you’re using.

@ericpn400 has joined the channel

@sorawee What’s the authentication protocol between you and the server look like?


Oh dear. That is a terrible API.

Can you enlighten me how it’s terrible?

@sorawee A few things: - The GET method is supposed to be safe, i.e. essentially read only. That allows crawlers and caches to work without fear of causing harm. But caching that logout request would cause a lot of problems. - HTTP already has a standard authentication protocol, there’s no need for them to roll their own POST endpoint and cookie-based system. Especially if your client is a robot and not a browser. - Query string parameters are often better represented as content in the request body.

For that first item, the HTTP spec even calls out that antipattern specifically: >For example, it is common for Web-based content editing software to use actions within query parameters, such as “page?do=delete”. If the purpose of such a resource is to perform an unsafe action, then the resource owner MUST disable or disallow that action when it is accessed using a safe request method. Failure to do so will result in unfortunate side effects when automated processes perform a GET on every URI reference for the sake of link maintenance, pre-fetching, building a search index, etc.

(Whoa, you can link to individual paragraphs of the http spec??? https://httpwg.org/http-core/draft-ietf-httpbis-semantics-latest.html#rfc.section.7.2.1.p.6)