Re: Continuous Benchmarking

Emily Shaffer <nasamuffin@xxxxxxxxxx> · Wed, 5 Feb 2025 15:14:21 -0800

On Mon, Feb 3, 2025 at 1:55 AM Patrick Steinhardt <ps@xxxxxx> wrote:
>
> Hi,
>
> due to a couple performance regressions that we have hit over the last
> couple Git releases at GitLab, we have started to set up an effort to
> implement continuous benchmarking for the Git project. The intent is to
> have regular (daily) benchmarking runs against Git's `master` and `next`
> branches to be able to spot any performance regressions before they make
> it into the next release.
>
> I have started with a relatively simple setup:
>
>   - I have started collection benchmarks that I myself do regularly [1].
>     These benchmarks are built on hyperfine and are thus not part of the
>     Git repository itself.
>
>   - GitLab CI runs on a nightly basis, executing a subset of these
>     benchmarks [2].
>
>   - Results are uploaded with a hyperfine adaptor to Bencher and are
>     summarized in dashboards.
>
> This at least gives us some visibility in severe performance outliers,
> whether these are improvements or regressions. Some statistics are
> applied on this data to automatically generate alerts when things are
> significantly changing.
>
> The setup is of course not perfect. It's built on top of CI jobs, which
> are by their very nature not really performing consistent. The scripts
> are hosted outside of Git. And I'm the only one running this.

For the CI "noisy neighbors" problem at least, it could be an option
to try to host in GCE (or some other compute that isn't shared). I
asked around a little inside Google and it seems like it's possible,
I'll keep pushing on it and see just how hard it would be. I'd even be
happy to trade on-push runs with noisy neighbors for nightly runs with
no neighbors, which makes it not really a CI thing - guess I will find
out if that's easier or harder for us to implement. :)

>
> So I wonder whether there is a wider interest in the Git community to
> have this infrastructure part of the Git project itself. This may
> include steps like the following:
>
>   - Extending our performance tests we have in "t/perf" to cover more
>     benchmarks.

Folks may be aware that our biggest (in terms of scale) internal
customer at Google is Android project. They are the ones who complain
to me and my team the most about performance; they are also open to
setting up nightly performance regression test. Would it be appealing
to get reports from such a test upstream? I think it's more compelling
to our customer team if we run it against the closed-source Android
repo, which means the Git project doesn't get to see as much about the
shape and content of the repos the performance tests are running
against, but we might be able to publish info about the shape without
the contents. Would that be useful? What would help to know (# of
commits, size of largest object, distribution of object size, # of
branches, size of worktree...?) If not having the specifics of the
repo-under-test is a dealbreaker we could explore running performance
tests in public with Android Open Source Project as the
repo-under-test instead, but it's much more manageable than full
Android.

Maybe in the long term it would be even better to have some toy
repo-under-test, like "sample repo with massive object store", "sample
repo with massive history", etc. to help us pinpoint which ways we're
scaling well and which ways we aren't. But having a ready made
repo-under-test, and a team who's got a very large stake in Git
performing well with it (so they can invest their time in setting up
tests), might be a good enough place to start.

>
>   - Writing an adaptor that is able to upload the data generated from
>     our perf scripts to Bencher.
>
>   - Setting up proper infrastructure to do the benchmarking. We may for
>     now also continue to use GitLab CI, but as said they are quite noisy
>     overall. Dedicated servers would help here.
>
>   - Sending alerts to the Git mailing list.

Yeah, I'd love to see reports coming to Git mailing list, or at least
bad news reports (maybe we don't need "everything ran great!" every
night, but would appreciate "last night the performance suite ran 50%
slower than last-6-months average"). That seems the easiest to
integrate with the way the project runs now, and I think we are used
to list noise :)

>
> I'm happy to hear your thoughts on this. Any ideas are welcome,
> including "we're not interested at all". In that case, we'd simply
> continue to maintain the setup ourselves at GitLab.

In general, though, yes! I am very interested! Google had trouble with
performance regressions over the last 3 months or so, I'd love to see
the community noticing it more. I think in general we have a sense
that performance matters, during code review, but aren't always sure
where it matters most, and a regular performance test that anybody can
see the results of would help a lot.

>
> Thanks!
>
> Patrick
>
> [1]: https://gitlab.com/gitlab-org/data-access/git/benchmarks
> [2]: https://gitlab.com/gitlab-org/data-access/git/benchmarks/-/blob/main/.gitlab-ci.yml?ref_type=heads
> [3]: https://bencher.dev/console/projects/git/plots