pocmatos
2019-11-20 15:07:15

OK - lets do the following. I just wanted to discuss a few details on how to get all the components of the future architecture speaking: Actions - DB - Dashboard. I know @notjack has a lot of experience with this so lets go ahead with the meeting as is. I will schedule another meeting and make sure the time suits all of us. In any case, I will take notes and write down what we discussed.


pocmatos
2019-11-20 15:08:22

There are two services I would like to integrate with the CI we are doing - LGTM and OSS-fuzz. LGTM is a simple push of a button while for OSS-fuzz we need to send a request to Google to be accepted.


pocmatos
2019-11-20 15:08:32

Any issues with me going ahead and doing both of these?


pocmatos
2019-11-20 15:10:52

on lgtm, I have submitted it a few weeks ago so we already have some results https://lgtm.com/projects/g/racket/racket/


pocmatos
2019-11-20 15:11:03

but having this in CI would be great.



pocmatos
2019-11-20 15:12:05

^^^ in very early stages.


samth
2019-11-20 15:25:15

@pocmatos I’m in favor of all of those things


pocmatos
2019-11-20 15:42:55

@samth from lgtm:A request to install <http://LGTM.com\|LGTM.com> has been submitted on the @racket account.


pocmatos
2019-11-20 15:43:05

I assume someone is going to get an email they need to approve.


samth
2019-11-20 15:50:25

@pocmatos I have not gotten an email yet


samth
2019-11-20 15:51:43

ah, they didn’t email but i found it on github



pocmatos
2019-11-20 15:58:17

Yay!


samth
2019-11-20 16:48:24

Another next step is to add something to notify #notifications here wrt the GH Actions


samth
2019-11-20 16:50:28

Another question is if we can cache the results of building Chez Scheme so that we don’t re-do that work when things don’t change.


notjack
2019-11-20 17:27:03

Caching will be tricky



samth
2019-11-20 17:40:36

also, can we check out with —depth=1?


pocmatos
2019-11-20 20:32:51

pocmatos
2019-11-20 20:33:59

We cannot check out with --depth=1. I have tried that in gitlab but was surprised when it failed. :slightly_smiling_face:


samth
2019-11-20 20:34:14

We already do that in Travis


pocmatos
2019-11-20 20:34:38

Ah - yes, in travis works because everything is a single job.


samth
2019-11-20 20:35:09

I’m not sure why that’s related


pocmatos
2019-11-20 20:35:18

When you do it in parallel, you push X, checkout X and build X. Someone in the meantime pushes Y.


pocmatos
2019-11-20 20:35:27

It checkouts Y and builds Y.


pocmatos
2019-11-20 20:35:43

When you start to run tests in X, X does not exist if you do --depth=1.


pocmatos
2019-11-20 20:35:46

and the build fails.


pocmatos
2019-11-20 20:36:08

Because you do a checkout per job.


samth
2019-11-20 20:36:13

Ah


pocmatos
2019-11-20 20:36:23

You could do --depth=5.


samth
2019-11-20 20:36:27

so we’d need to do a “checkout to this commit”


pocmatos
2019-11-20 20:36:42

And hope 5 is large enough. :slightly_smiling_face:


samth
2019-11-20 20:36:45

5 isn’t always enough because you could push arbitrarily many commits


pocmatos
2019-11-20 20:37:36

right … we don’t know the upper bound that’s safe but it’s not very high for racket. It also depends on how long CI runs and how fast we push commits. :slightly_smiling_face:


pocmatos
2019-11-20 20:38:03

I was intrigued by the idea of caching Chez. What did you have in mind?


notjack
2019-11-20 20:38:03

what’s the default depth? pull everything since the beginning of time?


pocmatos
2019-11-20 20:38:13

@notjack right.


samth
2019-11-20 20:38:15

yes


samth
2019-11-20 20:38:41

I mean cache everything built in ChezScheme until the commit changes


notjack
2019-11-20 20:38:53

so since racket/racket has 40k+ commits, even --depth=100 would be two orders of magnitude better


samth
2019-11-20 20:38:57

yes


samth
2019-11-20 20:39:17

also the most recent N commits have much less data


samth
2019-11-20 20:39:25

since the repositories split


pocmatos
2019-11-20 20:40:12

I mean - if the CI takes 1 hour, how many commits are there per hour in Racket? OK, sometimes Matthew pushes 3 or 4 one after the other but I would say even --depth=10 would be safe.


pocmatos
2019-11-20 20:40:51

@samth i like the caching idea. Not sure exactly how to do it yet but it could speed up the build slightly.


notjack
2019-11-20 20:40:58

it probably would be. would it be hard to debug problems caused by choosing too low a depth?


pocmatos
2019-11-20 20:41:13

no … checkout would say commit doesn’t exist.


pocmatos
2019-11-20 20:41:28

Actually first time I saw it, I was left scratching my head…


samth
2019-11-20 20:41:36

let’s go with 100 and we’ll see if it ever fails


pocmatos
2019-11-20 20:41:41

Sure.


notjack
2019-11-20 20:42:10

So, the caching


notjack
2019-11-20 20:42:40

racket uses a fork of chezscheme maintained at https://github.com/racket/chezscheme right?


samth
2019-11-20 20:42:47

yes


notjack
2019-11-20 20:43:18

is that what gets built during the build of racket/racket? so the build cross a repository boundary?


samth
2019-11-20 20:43:32

I just tested: all commits: 13.4 100: 4.2 10: 4.1 1: 4.0


samth
2019-11-20 20:43:43

so basically no win from going less than 100


notjack
2019-11-20 20:44:04

interesting


pocmatos
2019-11-20 20:45:40

Unfortunately yes, the build crosses a repo boundary. I had several thoughts about this while doing gitlab but I think with actions we are in a better position. The issue here is that chez can change and break racket build with no changes to the racket repo.


pocmatos
2019-11-20 20:46:30

So I think we should somehow trigger the CS build and test jobs for pushes to racket/ChezScheme.


pocmatos
2019-11-20 20:47:01

This might need an action on the ChezScheme repo but it shouldn’t be impossible to achieve (I hope) - given the rough edges still in gha.


samth
2019-11-20 20:48:53

There’s a github action you can use actions/cache to do caching



notjack
2019-11-20 20:50:18

The racket/chezscheme repository seems to have some git submodules too - does changing the version imported with a submodule require a commit?


samth
2019-11-20 20:50:36

we can write a github action on the racket/ChezScheme repo to trigger that on the racket/racket repo


samth
2019-11-20 20:50:46

@notjack yes


notjack
2019-11-20 20:51:13

that’s good, then we at least don’t have to worry about triggering builds of racket/chezscheme whenever those dependencies get updated too


notjack
2019-11-20 20:52:01

Could we use a git submodule in racket/racket to import chezscheme and use that for building? It would require periodically keeping things in sync, which isn’t great, but would eliminate the need to do this cross-repo event triggering.


notjack
2019-11-20 20:52:39

(this kind of dependency-graph-based CI triggering and caching is what the system at my day job does and it gets real complicated real fast)


samth
2019-11-20 20:52:48

@notjack I would be in favor of that but traditionally @mflatt has not been


samth
2019-11-20 20:53:18

also you can build Racket with an external Chez Scheme, which is probably how we’d do caching


notjack
2019-11-20 20:54:00

drawback: we won’t find out if a commit to racket/chezscheme breaks racket/racket until we attempt to update the commit used to import the submodule


popa.bogdanp
2019-11-20 21:01:39

@popa.bogdanp has joined the channel


popa.bogdanp
2019-11-20 21:09:15

Hey folks! I haven’t been keeping up with the discussion and work around this, but I wanted to point out re. this comment1 that my setup-racket action is able to install snapshot builds of racket on all 3 platforms (though the implementation is a bit hacky2), in case that might help. On Linux, installing a snapshot using this action takes less than 20 seconds so it should be much faster than building from source.


notjack
2019-11-20 21:14:56

welcome @popa.bogdanp :wave: good job on the racket setup action btw


popa.bogdanp
2019-11-20 21:17:04

Thanks! I mostly just ripped the guts out of the official setup-python action and based it on that :smile:


pocmatos
2019-11-20 21:17:38

Thanks for that. I might be using that very soon to speed up our PR workflow.


pocmatos
2019-11-20 21:17:49

By that - I mean your action.


pocmatos
2019-11-20 21:18:21

@samth To enable notifications to slack I need some sort of incoming webhook secret. Are you the admin for these things here on slack?


pocmatos
2019-11-20 21:45:55

I have been thinking for awhile and I might be missing the right technology. How can we go about testing Racket on FreeBSD? It doesn’t run on docker, there’s no github runner support for that os so we need to virtualize. The closest I got was to use vagrant to test racket manually but I know of no good way to script this atm. Any suggestions and PR demo’ing this would be great! :slightly_smiling_face:


samth
2019-11-20 21:46:34

yes, I’ll add the secret on github


pocmatos
2019-11-20 21:47:01

Once you add the secret, can you send me the secret variable name so I can create the workflow? Thanks.


samth
2019-11-20 21:53:18

there’s now a SLACK_WEBHOOK_URL secret


notjack
2019-11-20 21:57:38

@pocmatos based on https://wiki.freebsd.org/Docker and https://reviews.freebsd.org/D21570, it looks like the freebsd folks are actively trying to improve the docker<->freebsd situation


notjack
2019-11-20 21:58:14

asking the freebsd-virtualization mailing list what to do might be a good starting point


pocmatos
2019-11-20 22:02:53

Sure!


pocmatos
2019-11-20 22:02:59

@samth thanks.


pocmatos
2019-11-20 22:03:24

Shall we use the #notifications channel? failures only or everything?


samth
2019-11-20 22:03:57

yes (you have to use #notifications). and that channel currently has everything for all the other CI systems


samth
2019-11-20 22:04:06

except DrDr, we should fix that sometime :slightly_smiling_face:


samth
2019-11-20 22:04:19

anyone in this channel interested in DrDr should let me know …


pocmatos
2019-11-20 22:04:53

Is there a DrDr notifications system? Can you add me to the cc of that?


samth
2019-11-20 22:05:44

@pocmatos currently, DrDr notifies the responsible person or people for each file that fails


samth
2019-11-20 22:08:08

here’s the body of my most recent drdr email: DrDr has finished building push #53213 after 3.02h. <http://drdr.racket-lang.org/53213/> A file you are responsible for has a condition that may need inspecting. stderr: <http://drdr.racket-lang.org/53213/racket/share/pkgs/typed-racket-test/historical-counterexamples.rkt> <http://drdr.racket-lang.org/53213/cs/racket/share/pkgs/typed-racket-test/historical-counterexamples.rkt> I also get the emails for files with no one responsible; that looks like: DrDr has finished building push #53213 after 3.02h. <http://drdr.racket-lang.org/53213/> A file you are responsible for has a condition that may need inspecting. stderr: <http://drdr.racket-lang.org/53213/racket/share/pkgs/aws/aws/sigv4.rkt> <http://drdr.racket-lang.org/53213/pkg-src/build/make> <http://drdr.racket-lang.org/53213/cs/racket/share/pkgs/aws/aws/sigv4.rkt> unclean: <http://drdr.racket-lang.org/53213/racket/share/pkgs/aws/aws/sigv4.rkt> <http://drdr.racket-lang.org/53213/cs/racket/share/pkgs/aws/aws/sigv4.rkt>


notjack
2019-11-20 22:09:43

How much overlap is there between what DrDr does and what the package build server does? I’ve always been confused why there’s two of these systems.


pocmatos
2019-11-20 22:10:09

I thought DrDr was the package build server…


notjack
2019-11-20 22:10:18

I’m… not actually sure


pocmatos
2019-11-20 22:10:46

Me neither, but I always assumed that.


samth
2019-11-20 22:10:53

no, they’re totally different


pocmatos
2019-11-20 22:11:02

Whoops - bad assumptions.


samth
2019-11-20 22:11:11

@pocmatos maybe it’s worth writing these things down on the wiki … :slightly_smiling_face:


samth
2019-11-20 22:11:31

http://drdr.racket-lang.org\|drdr.racket-lang.org is a CI system for, roughly, the “main-distribution”


pocmatos
2019-11-20 22:11:33

@samth good idea. You explain it to me and I will try and put them down. :slightly_smiling_face:


pocmatos
2019-11-20 22:11:53

So, would that be something we could replace with GHA in the long term?


samth
2019-11-20 22:13:05

it works by checking out the latest racket/racket, building it and all packages in main-distribution and main-distribution-test, and then executing every racket file in every package either with raco test or racket depending on configuration.


samth
2019-11-20 22:13:25

it also now does this with a racketcs build, similarly executing everything


samth
2019-11-20 22:14:13

it runs on a single, bespoke, Linux server (located at IU). the configuration is a combination of the racket/drdr repository and a lot of state on that machine


samth
2019-11-20 22:14:23

basically nothing is containerized/protected


notjack
2019-11-20 22:14:54

or hermetic / easily reproducible?


samth
2019-11-20 22:15:11

we have a very full history of runs of the system, so you can go back in time, plus there’s logging/charting of timing results


samth
2019-11-20 22:15:27

every 100 builds is saved and downloadable


samth
2019-11-20 22:16:00

pocmatos
2019-11-20 22:16:06

OK. I think with GHA + dashboard we could have that implemented.


samth
2019-11-20 22:16:12

and yes, things are not always easily reproducible


pocmatos
2019-11-20 22:16:21

Thanks.


samth
2019-11-20 22:16:42

by far the biggest challenge with doing that somewhere else is that it’s 3 hours of wall-clock time on a 12-core machine per run


samth
2019-11-20 22:16:54

~8 hours compute time


samth
2019-11-20 22:17:11

plus we’re storing a lot of data


pocmatos
2019-11-20 22:17:24

I have a 40 core machine I have been using for racket gitlab. Soon GHA.


pocmatos
2019-11-20 22:17:49

It should be speedier there - it has 2 Xeons 20cores each.


samth
2019-11-20 22:18:30

yes, although less than you’d hope


samth
2019-11-20 22:18:59

there are a number of individual tests that take 20+ minutes


notjack
2019-11-20 22:19:54

cpu-heavy tests or io-heavy tests?



notjack
2019-11-20 22:27:23

hmms


samth
2019-11-20 22:28:31

samth
2019-11-20 22:28:55

it builds each package using the current release, and rebuilds each package when the package has changed since the previous run


samth
2019-11-20 22:29:02

it runs once every 24 hours


samth
2019-11-20 22:29:15

samth
2019-11-20 22:29:52

similarly, https://plt.eecs.northwestern.edu/pkg-build/ does the same every 24 hours with the most recent snapshot, and builds every package (since the snapshot changed)


samth
2019-11-20 22:30:54

sometimes that machine does something different, and builds https://plt.eecs.northwestern.edu/release-pkg-build/ using the most recent release candidate


samth
2019-11-20 22:31:18

also every day, both of the sites listed at http://snapshot.racket-lang.org\|snapshot.racket-lang.org build Racket on a wide variety of platforms


samth
2019-11-20 22:32:01

(which is also a form of CI, since almost any kind of build error in any main-distribution package will cause them to fail)


samth
2019-11-20 22:32:26

also, during the release process, there are regular builds on a variety of platforms which you can get from http://pre-release.racket-lang.org\|pre-release.racket-lang.org


samth
2019-11-20 22:32:59

those (snapshots and pre-release) are built with https://github.com/racket/distro-build/


notjack
2019-11-20 22:34:42

So what are our goals for CI in racket? So far I’m hearing:

• Ensure that Racket works on the operating systems and architectures we claim to support • Get CI feedback faster (both by running things faster and by running them when dependencies change) • Make it easier to get historical CI data about Racket @pocmatos Did I miss any? Which of those is most important?


samth
2019-11-20 22:35:34

Reduce the amount of CI stuff we have to maintain


pocmatos
2019-11-20 22:35:50

@samth +1


notjack
2019-11-20 22:35:53

also, do we want to focus only on CI for the main distribution, or on the wider racket ecosystem?


samth
2019-11-20 22:36:06

Overall increase the quality of Racket software


samth
2019-11-20 22:36:40

I think doing both is good but we should start with racket/racket, follow up with racket/* and continue to the whole ecosystem


pocmatos
2019-11-20 22:36:58

With regards to your point 3, it not only about getting historical data but understanding the evolution of racket as a piece of software. Currently all the benchmark is done locally and it’s not straightforward to reproduce.


notjack
2019-11-20 22:38:13

if we had to pick between increasing CI performance/coverage, and reducing the maintenance burden of CI, which should we focus on?


pocmatos
2019-11-20 22:39:50

I would prioritize CI maintenance to CI performance to start with.


notjack
2019-11-20 22:40:23

I think that’s a good direction to go in


pocmatos
2019-11-20 22:40:44

Although coverage is high up there, not worried too much about performance except for PRs. We can reduce coverage there in order to improve performance. Have a push workflow that has more coverage and decent wall time, and a nightly scheduled run with full coverage.


notjack
2019-11-20 22:41:33

would we be okay decreasing coverage in the short term if it meant reducing maintenance burden?


pocmatos
2019-11-20 22:42:15

One thing that is quite worrying for me - from a maintainership point of view is the enormous amount of OS, architectures, vms, gcs and build configurations we say we support but most of those are not tested.


notjack
2019-11-20 22:42:31

yeah that spooks me too


pocmatos
2019-11-20 22:42:36

Like - an extreme example - the support for QNX.


pocmatos
2019-11-20 22:43:05

I didn’t even know what it was - it’s a commercial OS. We have code ifdefing for this thing but probably nobody cares.


pocmatos
2019-11-20 22:43:15

However Matthew is extremelly reluctant to remove the code.


notjack
2019-11-20 22:43:40

@pocmatos there’s that spreadsheet we made a while ago describing all the combinations and support status, mayhaps that ought to be shared more widely or moved to the wiki


pocmatos
2019-11-20 22:43:53

Maybe I should make a better case for it though, however how can we have code in that’s just impossible to test. We don’t even have evidence anyone uses it.


pocmatos
2019-11-20 22:44:16

@notjack i started trying to move that to the wiki? Did you miss the url i sent earlier?


notjack
2019-11-20 22:44:24

oh! I didn’t notice



pocmatos
2019-11-20 22:45:59

notjack
2019-11-20 22:54:27

I think it’s reasonable to delete operating-system-specific and architecture-specific code that we 1) can’t find known users for and 2) can’t test. It will bitrot and stop working over time anyway. I’m all but certain that racket-on-QNX is currently broken, just because we’d have absolutely no way of knowing if it broke and no way to test for it.


notjack
2019-11-20 22:55:12

it’s a nontrivial maintenance burden


pocmatos
2019-11-20 22:58:17

I will try to discuss this with Matthew further since it’s also code that has been in our codebase since the days of PLT Scheme, with no testing whatsoever. And you are guaranteed to be correct - Racket QNX definitely doesn’t work but since it’s a commercial OS, which we can’t test, we’ll never know.


pocmatos
2019-11-20 22:58:31

It’s time for me to leave. Talk to you tomorrow/later.


notjack
2019-11-20 22:58:39

:wave:


samth
2019-11-21 00:39:33

I think Matthew’s feeling is that we learned something back then about how to make Racket work on qnx and we shouldn’t throw away that knowledge


samth
2019-11-21 00:39:53

I don’t think he’s under the impression that it currently works out of the box there


samth
2019-11-21 00:40:36

I think also the maintenance burden is low for that


sorawee
2019-11-21 02:18:10

@sorawee has joined the channel


sorawee
2019-11-21 02:25:34

Just want to chime in to say that: while I think that we should take advantage of any features available in GitHub Actions, I disagree with pushing users to use it for the following reasons:

• Some users might not want to use GitHub (trust, etc.) • Some users might not be able to use GitHub (company uses GitLab, etc.) I’m not saying that we must right now support multiple platforms. I think it makes perfect sense to focus on GitHub Actions for now. However, I don’t like how https://github.com/racket/racket/wiki/Continuous-Integration states that:

> We are committed to using GitHub Actions for CI as it provides an unparalleled level of integration with our current workflow. which, as I understand, says that we won’t ever support other platforms.


pocmatos
2019-11-21 06:50:52

Good morning (on my side of the world at least)!


pocmatos
2019-11-21 06:53:39

@sorawee thanks for your input, however I don’t really understand what you mean. CI is not used by the users. It’s a workflow service on the developers side. The developers cannot and should not have to maintain multiple CI systems, which is why we are trying to consolidate. What I meant with what I wrote in the wiki is that our efforts are at the moment to implement CI using GitHub Actions and in order to consolidate, ditch other CI platforms.


notjack
2019-11-21 06:57:09

Also when we’re talking about “CI” we mean specifically CI for the main distribution. The package build server will not and should not move to github actions.


sorawee
2019-11-21 06:58:17

Ah, I see now. I totally misunderstood and thought that this is travis-racket-like thing.


sorawee
2019-11-21 06:58:31

My apologies


pocmatos
2019-11-21 06:58:54

@sorawee ah, Gregs project? No - this has nothing to do with it. :slightly_smiling_face:


notjack
2019-11-21 07:01:10

Yup, CI for authors of Racket packages will not change :simple_smile: