Re: [PATCH 0/3] rev-list: add support for commits in `--missing`

Karthik Nayak <karthik.188@xxxxxxxxx> · Thu, 12 Oct 2023 15:23:27 +0200

On Thu, Oct 12, 2023 at 1:04 PM Patrick Steinhardt <ps@xxxxxx> wrote:
> Wouldn't this have the potential to significantly regress performance
> for all those preexisting users of the `--missing` option? The commit
> graph is quite an important optimization nowadays, and especially in
> commands where we potentially walk a lot of commits (like we may do
> here) it can result in queries that are orders of magnitudes faster.
>
> If we were introducing a new option then I think this would be an
> acceptable tradeoff as we could still fix the performance issue in
> another iteration. But we don't and instead aim to change the meaning of
> `--missing` which may already have users out there. Seen in that light
> it does not seem sensible to me to just disable the graph for them.
>

Agreed, that there is a rather huge performance drop doing this. So either:
1. Add commit support for `--missing` and disable commit-graph. There is huge
performance drop here for commits.
2. Add commit support for `--missing` but disabling commit-graph is provided via
an additional `--disable-commit-graph` option. This expects the user to manually
disable commit-graph for `--missing` to work correctly with commits.

I was also thinking about this, but people who are using `--missing`
till date, also use
it with `--objects`. Using `--objects` is already slow and doesn't
benefit from the
commit-graph. So I don't know if there is much of a performance hit.

Some benchmarks on the Git repo:

With commit graphs:

$ hyperfine --warmup 2 "git rev-list --objects --missing=print HEAD > /dev/null"
Benchmark 1: git rev-list --objects --missing=print HEAD > /dev/null
  Time (mean ± σ):      1.336 s ±  0.013 s    [User: 1.278 s, System: 0.054 s]
  Range (min … max):    1.323 s …  1.365 s    10 runs

$ hyperfine --warmup 2 "git rev-list --missing=print HEAD > /dev/null"
Benchmark 1: git rev-list --missing=print HEAD > /dev/null
  Time (mean ± σ):      29.6 ms ±   1.5 ms    [User: 20.2 ms, System: 9.1 ms]
  Range (min … max):    27.5 ms …  35.1 ms    92 runs

Without commit graphs:

$ hyperfine --warmup 2 "git rev-list --objects --missing=print HEAD > /dev/null"
Benchmark 1: git rev-list --objects --missing=print HEAD > /dev/null
  Time (mean ± σ):      1.735 s ±  0.022 s    [User: 1.672 s, System: 0.057 s]
  Range (min … max):    1.703 s …  1.765 s    10 runs

$ hyperfine --warmup 2 "git rev-list --missing=print HEAD > /dev/null"
Benchmark 1: git rev-list --missing=print HEAD > /dev/null
  Time (mean ± σ):     426.6 ms ±  10.2 ms    [User: 410.1 ms, System: 15.1 ms]
  Range (min … max):   414.9 ms … 441.3 ms    10 runs

This shows that while there is definitely a performance loss when
disabling commit-graphs.
This loss is much lesser when used alongside the `--objects` flag.

Overall, I lean towards the option "1", but okay with either.

> Did you figure out what exactly the root cause of this functional
> regression is? If so, can we address that root cause instead?

Are you referring to `--missing` not working with commits? Looking at
the history
of `--missing` shows:

- caf3827e2f: This commits introduces the `--missing` option, it
basically states that
for the upcoming partial clone mechanism, we want to add support for identifying
missing objects.
- 7c0fe330d5: This commit adds support for trees in `--missing`. It
also goes on to
state that `--missing` assumed only blobs could be missing.

Both of these show that `--missing` started off support for only blobs
and eventually
added support to trees (similar to how we're adding support for
commits here). So
I guess the intention was to never work with commits. But the documentation and
the option seem generic enough. Considering that trees was an extension, commits
should be treated the same way.