
OK - lets do the following. I just wanted to discuss a few details on how to get all the components of the future architecture speaking: Actions - DB - Dashboard. I know @notjack has a lot of experience with this so lets go ahead with the meeting as is. I will schedule another meeting and make sure the time suits all of us. In any case, I will take notes and write down what we discussed.

There are two services I would like to integrate with the CI we are doing - LGTM and OSS-fuzz. LGTM is a simple push of a button while for OSS-fuzz we need to send a request to Google to be accepted.

Any issues with me going ahead and doing both of these?

on lgtm, I have submitted it a few weeks ago so we already have some results https://lgtm.com/projects/g/racket/racket/

but having this in CI would be great.


^^^ in very early stages.

@pocmatos I’m in favor of all of those things

@samth from lgtm:A request to install <http://LGTM.com\|LGTM.com> has been submitted on the @racket account.

I assume someone is going to get an email they need to approve.

@pocmatos I have not gotten an email yet

ah, they didn’t email but i found it on github

Thanks - first github workflow is ongoing : https://github.com/racket/racket/commit/6e63f6a99fe46317f5a9538d85a4f8ad1d87e6e1/checks?check_suite_id=320543270

Yay!

Another next step is to add something to notify #notifications here wrt the GH Actions

Another question is if we can cache the results of building Chez Scheme so that we don’t re-do that work when things don’t change.

Caching will be tricky


also, can we check out with —depth=1?


We cannot check out with --depth=1
. I have tried that in gitlab but was surprised when it failed. :slightly_smiling_face:

We already do that in Travis

Ah - yes, in travis works because everything is a single job.

I’m not sure why that’s related

When you do it in parallel, you push X, checkout X and build X. Someone in the meantime pushes Y.

It checkouts Y and builds Y.

When you start to run tests in X, X does not exist if you do --depth=1
.

and the build fails.

Because you do a checkout per job.

Ah

You could do --depth=5
.

so we’d need to do a “checkout to this commit”

And hope 5 is large enough. :slightly_smiling_face:

5 isn’t always enough because you could push arbitrarily many commits

right … we don’t know the upper bound that’s safe but it’s not very high for racket. It also depends on how long CI runs and how fast we push commits. :slightly_smiling_face:

I was intrigued by the idea of caching Chez. What did you have in mind?

what’s the default depth? pull everything since the beginning of time?

@notjack right.

yes

I mean cache everything built in ChezScheme until the commit changes

so since racket/racket has 40k+ commits, even --depth=100
would be two orders of magnitude better

yes

also the most recent N commits have much less data

since the repositories split

I mean - if the CI takes 1 hour, how many commits are there per hour in Racket? OK, sometimes Matthew pushes 3 or 4 one after the other but I would say even --depth=10
would be safe.

@samth i like the caching idea. Not sure exactly how to do it yet but it could speed up the build slightly.

it probably would be. would it be hard to debug problems caused by choosing too low a depth?

no … checkout would say commit doesn’t exist.

Actually first time I saw it, I was left scratching my head…

let’s go with 100 and we’ll see if it ever fails

Sure.

So, the caching

racket uses a fork of chezscheme maintained at https://github.com/racket/chezscheme right?

yes

is that what gets built during the build of racket/racket? so the build cross a repository boundary?

I just tested: all commits: 13.4 100: 4.2 10: 4.1 1: 4.0

so basically no win from going less than 100

interesting

Unfortunately yes, the build crosses a repo boundary. I had several thoughts about this while doing gitlab but I think with actions we are in a better position. The issue here is that chez can change and break racket build with no changes to the racket repo.

So I think we should somehow trigger the CS build and test jobs for pushes to racket/ChezScheme.

This might need an action on the ChezScheme repo but it shouldn’t be impossible to achieve (I hope) - given the rough edges still in gha.

There’s a github action you can use actions/cache
to do caching


The racket/chezscheme
repository seems to have some git submodules too - does changing the version imported with a submodule require a commit?

we can write a github action on the racket/ChezScheme repo to trigger that on the racket/racket repo

@notjack yes

that’s good, then we at least don’t have to worry about triggering builds of racket/chezscheme
whenever those dependencies get updated too

Could we use a git submodule in racket/racket
to import chezscheme and use that for building? It would require periodically keeping things in sync, which isn’t great, but would eliminate the need to do this cross-repo event triggering.

(this kind of dependency-graph-based CI triggering and caching is what the system at my day job does and it gets real complicated real fast)

@notjack I would be in favor of that but traditionally @mflatt has not been

also you can build Racket with an external Chez Scheme, which is probably how we’d do caching

drawback: we won’t find out if a commit to racket/chezscheme breaks racket/racket until we attempt to update the commit used to import the submodule

@popa.bogdanp has joined the channel

Hey folks! I haven’t been keeping up with the discussion and work around this, but I wanted to point out re. this comment1 that my setup-racket
action is able to install snapshot builds of racket on all 3 platforms (though the implementation is a bit hacky), in case that might help. On Linux, installing a snapshot using this action takes less than 20 seconds so it should be much faster than building from source.

welcome @popa.bogdanp :wave: good job on the racket setup action btw

Thanks! I mostly just ripped the guts out of the official setup-python
action and based it on that :smile:

Thanks for that. I might be using that very soon to speed up our PR workflow.

By that - I mean your action.

@samth To enable notifications to slack I need some sort of incoming webhook secret. Are you the admin for these things here on slack?

I have been thinking for awhile and I might be missing the right technology. How can we go about testing Racket on FreeBSD? It doesn’t run on docker, there’s no github runner support for that os so we need to virtualize. The closest I got was to use vagrant to test racket manually but I know of no good way to script this atm. Any suggestions and PR demo’ing this would be great! :slightly_smiling_face:

yes, I’ll add the secret on github

Once you add the secret, can you send me the secret variable name so I can create the workflow? Thanks.

there’s now a SLACK_WEBHOOK_URL secret

@pocmatos based on https://wiki.freebsd.org/Docker and https://reviews.freebsd.org/D21570, it looks like the freebsd folks are actively trying to improve the docker<->freebsd situation

asking the freebsd-virtualization
mailing list what to do might be a good starting point

Sure!

@samth thanks.


yes (you have to use #notifications). and that channel currently has everything for all the other CI systems

except DrDr, we should fix that sometime :slightly_smiling_face:

anyone in this channel interested in DrDr should let me know …

Is there a DrDr notifications system? Can you add me to the cc of that?

@pocmatos currently, DrDr notifies the responsible person or people for each file that fails

here’s the body of my most recent drdr email: DrDr has finished building push #53213 after 3.02h.
<http://drdr.racket-lang.org/53213/>
A file you are responsible for has a condition that may need inspecting.
stderr:
<http://drdr.racket-lang.org/53213/racket/share/pkgs/typed-racket-test/historical-counterexamples.rkt>
<http://drdr.racket-lang.org/53213/cs/racket/share/pkgs/typed-racket-test/historical-counterexamples.rkt>
I also get the emails for files with no one responsible; that looks like: DrDr has finished building push #53213 after 3.02h.
<http://drdr.racket-lang.org/53213/>
A file you are responsible for has a condition that may need inspecting.
stderr:
<http://drdr.racket-lang.org/53213/racket/share/pkgs/aws/aws/sigv4.rkt>
<http://drdr.racket-lang.org/53213/pkg-src/build/make>
<http://drdr.racket-lang.org/53213/cs/racket/share/pkgs/aws/aws/sigv4.rkt>
unclean:
<http://drdr.racket-lang.org/53213/racket/share/pkgs/aws/aws/sigv4.rkt>
<http://drdr.racket-lang.org/53213/cs/racket/share/pkgs/aws/aws/sigv4.rkt>

How much overlap is there between what DrDr does and what the package build server does? I’ve always been confused why there’s two of these systems.

I thought DrDr was the package build server…

I’m… not actually sure

Me neither, but I always assumed that.

no, they’re totally different

Whoops - bad assumptions.

@pocmatos maybe it’s worth writing these things down on the wiki … :slightly_smiling_face:

http://drdr.racket-lang.org\|drdr.racket-lang.org is a CI system for, roughly, the “main-distribution”

@samth good idea. You explain it to me and I will try and put them down. :slightly_smiling_face:

So, would that be something we could replace with GHA in the long term?

it works by checking out the latest racket/racket
, building it and all packages in main-distribution
and main-distribution-test
, and then executing every racket file in every package either with raco test
or racket
depending on configuration.

it also now does this with a racketcs build, similarly executing everything

it runs on a single, bespoke, Linux server (located at IU). the configuration is a combination of the racket/drdr
repository and a lot of state on that machine

basically nothing is containerized/protected

or hermetic / easily reproducible?

we have a very full history of runs of the system, so you can go back in time, plus there’s logging/charting of timing results

every 100 builds is saved and downloadable


OK. I think with GHA + dashboard we could have that implemented.

and yes, things are not always easily reproducible

Thanks.

by far the biggest challenge with doing that somewhere else is that it’s 3 hours of wall-clock time on a 12-core machine per run

~8 hours compute time

plus we’re storing a lot of data

I have a 40 core machine I have been using for racket gitlab. Soon GHA.

It should be speedier there - it has 2 Xeons 20cores each.

yes, although less than you’d hope

there are a number of individual tests that take 20+ minutes

cpu-heavy tests or io-heavy tests?


hmms

In contrast, http://pkg-build.racket-lang.org\|pkg-build.racket-lang.org builds all the packages in a VM using https://github.com/racket/pkg-build

it builds each package using the current release, and rebuilds each package when the package has changed since the previous run

it runs once every 24 hours


similarly, https://plt.eecs.northwestern.edu/pkg-build/ does the same every 24 hours with the most recent snapshot, and builds every package (since the snapshot changed)

sometimes that machine does something different, and builds https://plt.eecs.northwestern.edu/release-pkg-build/ using the most recent release candidate

also every day, both of the sites listed at http://snapshot.racket-lang.org\|snapshot.racket-lang.org build Racket on a wide variety of platforms

(which is also a form of CI, since almost any kind of build error in any main-distribution package will cause them to fail)

also, during the release process, there are regular builds on a variety of platforms which you can get from http://pre-release.racket-lang.org\|pre-release.racket-lang.org

those (snapshots and pre-release) are built with https://github.com/racket/distro-build/

So what are our goals for CI in racket? So far I’m hearing:
• Ensure that Racket works on the operating systems and architectures we claim to support • Get CI feedback faster (both by running things faster and by running them when dependencies change) • Make it easier to get historical CI data about Racket @pocmatos Did I miss any? Which of those is most important?

Reduce the amount of CI stuff we have to maintain

@samth +1

also, do we want to focus only on CI for the main distribution, or on the wider racket ecosystem?

Overall increase the quality of Racket software

I think doing both is good but we should start with racket/racket, follow up with racket/* and continue to the whole ecosystem

With regards to your point 3, it not only about getting historical data but understanding the evolution of racket as a piece of software. Currently all the benchmark is done locally and it’s not straightforward to reproduce.

if we had to pick between increasing CI performance/coverage, and reducing the maintenance burden of CI, which should we focus on?

I would prioritize CI maintenance to CI performance to start with.

I think that’s a good direction to go in

Although coverage is high up there, not worried too much about performance except for PRs. We can reduce coverage there in order to improve performance. Have a push workflow that has more coverage and decent wall time, and a nightly scheduled run with full coverage.

would we be okay decreasing coverage in the short term if it meant reducing maintenance burden?

One thing that is quite worrying for me - from a maintainership point of view is the enormous amount of OS, architectures, vms, gcs and build configurations we say we support but most of those are not tested.

yeah that spooks me too

Like - an extreme example - the support for QNX.

I didn’t even know what it was - it’s a commercial OS. We have code ifdefing for this thing but probably nobody cares.

However Matthew is extremelly reluctant to remove the code.

@pocmatos there’s that spreadsheet we made a while ago describing all the combinations and support status, mayhaps that ought to be shared more widely or moved to the wiki

Maybe I should make a better case for it though, however how can we have code in that’s just impossible to test. We don’t even have evidence anyone uses it.

@notjack i started trying to move that to the wiki? Did you miss the url i sent earlier?

oh! I didn’t notice


The QNX thing for example: https://github.com/racket/racket/issues/2906#issuecomment-553410336

I think it’s reasonable to delete operating-system-specific and architecture-specific code that we 1) can’t find known users for and 2) can’t test. It will bitrot and stop working over time anyway. I’m all but certain that racket-on-QNX is currently broken, just because we’d have absolutely no way of knowing if it broke and no way to test for it.

it’s a nontrivial maintenance burden

I will try to discuss this with Matthew further since it’s also code that has been in our codebase since the days of PLT Scheme, with no testing whatsoever. And you are guaranteed to be correct - Racket QNX definitely doesn’t work but since it’s a commercial OS, which we can’t test, we’ll never know.

It’s time for me to leave. Talk to you tomorrow/later.

:wave:

I think Matthew’s feeling is that we learned something back then about how to make Racket work on qnx and we shouldn’t throw away that knowledge

I don’t think he’s under the impression that it currently works out of the box there

I think also the maintenance burden is low for that

@sorawee has joined the channel

Just want to chime in to say that: while I think that we should take advantage of any features available in GitHub Actions, I disagree with pushing users to use it for the following reasons:
• Some users might not want to use GitHub (trust, etc.) • Some users might not be able to use GitHub (company uses GitLab, etc.) I’m not saying that we must right now support multiple platforms. I think it makes perfect sense to focus on GitHub Actions for now. However, I don’t like how https://github.com/racket/racket/wiki/Continuous-Integration states that:
> We are committed to using GitHub Actions for CI as it provides an unparalleled level of integration with our current workflow. which, as I understand, says that we won’t ever support other platforms.

Good morning (on my side of the world at least)!

@sorawee thanks for your input, however I don’t really understand what you mean. CI is not used by the users. It’s a workflow service on the developers side. The developers cannot and should not have to maintain multiple CI systems, which is why we are trying to consolidate. What I meant with what I wrote in the wiki is that our efforts are at the moment to implement CI using GitHub Actions and in order to consolidate, ditch other CI platforms.

Also when we’re talking about “CI” we mean specifically CI for the main distribution. The package build server will not and should not move to github actions.

Ah, I see now. I totally misunderstood and thought that this is travis-racket-
like thing.

My apologies

@sorawee ah, Gregs project? No - this has nothing to do with it. :slightly_smiling_face:

Yup, CI for authors of Racket packages will not change :simple_smile: