For all those little papers scattered across your desk
Making flaky tests pass cries out for functional-programming idioms.
While reading A beautiful Python monstrosity, I was struck by several thoughts. Perhaps performance tests shouldn’t be automated with the same tools as unit tests? Is there a better way to write flaky tests?
A unit test doesn’t seem to be the best way to describe a performance test. More likely, we want to set up a benchmarking harness, run the programs, and measure various performance indicators. This can be done automatically in a build pipeline, for example.
If these measurements are collected on each run we can perform analyses. For example:
All of this can also be automated. I wouldn’t write any of it as a unit test, however.
The examples in the original post also seem to concentrate on algorithmic (time) complexity; this can be done with statistical tests, too, given a wealth of data. It’s not something I want to run as part of the (local) unit tests, though. If it’s important enough to block PRs, put it in the build pipeline (and I should be capable of running it locally; it shouldn’t be on by default, though).
An orthogonal concept: How do we make flaky tests less flaky? We should engineer the tests to be less reliant on flakiness, but automatically repeating the tests is a reasonable hack in the meantime.
Here’s my transcribed Racket repeat-flaky
procedure, which doesn’t require
decorators or other ideas:
(define (repeat-flaky test [n 10])
(match n
[0 (test)]
[(? positive? m)
(with-handlers ([exn:fail? (λ (_exn) (repeat-flaky test (sub1 n)))])
(test))]))
;; Example:
(repeat-flaky (thunk (check-equal? (random 1 3) (random 1 3))))
The core idea is to pass functions around. Python makes this hard because its
lambda
doesn’t permit arbitrary functions; the decorator over an unnamed
function is essentially recreating higher-order functions receiving anonymous
functions. This is thus the essence of repeating fallible tests; a
generalization allows the exn:fail?
test to be replaced by the client or for
subsequent invocations to know about the caught exception by having it passed as
an optional argument.
The thunk
could be eliminated (from surface syntax) by a macro if desired.
This works without needing nonlocal
, by the way, since the Racket equivalent
of Python’s a = 1
that automatically introduces a
is (let ([a 1]) …)
or
(define a 1)
. This isn’t mutation, it’s binding. Further, if you want to refer
to a
from a higher scope, you do so by writing a
(as long it isn’t
shadowed), even with set!
. Lexical scoping rules give you predictable control
of which identifiers refer to which bindings in each part of the program text.
The ability to properly nest repeat-flaky
makes a lot of the need for mutation
(and thus complicated scope references) go away, though:
(repeat-flaky
(thunk
(define micro-time …)
(define tiny-time …)
(check … micro-time tiny-time …)
(repeat-flaky
(thunk
(define small-time …)
;; we have access to micro, tiny here
(check … micro-time tiny-time small-time …)))))
Each nesting is repeated, however, so this singly-nested flaky test could run up to 25 times (instead of the original’s 5). This could be a feature, though, and can be controlled with the optional repeat argument.
lambda
is not.