On Thu, Oct 12, 2023 at 1:04 PM Patrick Steinhardt <ps@xxxxxx> wrote: > Wouldn't this have the potential to significantly regress performance > for all those preexisting users of the `--missing` option? The commit > graph is quite an important optimization nowadays, and especially in > commands where we potentially walk a lot of commits (like we may do > here) it can result in queries that are orders of magnitudes faster. > > If we were introducing a new option then I think this would be an > acceptable tradeoff as we could still fix the performance issue in > another iteration. But we don't and instead aim to change the meaning of > `--missing` which may already have users out there. Seen in that light > it does not seem sensible to me to just disable the graph for them. > Agreed, that there is a rather huge performance drop doing this. So either: 1. Add commit support for `--missing` and disable commit-graph. There is huge performance drop here for commits. 2. Add commit support for `--missing` but disabling commit-graph is provided via an additional `--disable-commit-graph` option. This expects the user to manually disable commit-graph for `--missing` to work correctly with commits. I was also thinking about this, but people who are using `--missing` till date, also use it with `--objects`. Using `--objects` is already slow and doesn't benefit from the commit-graph. So I don't know if there is much of a performance hit. Some benchmarks on the Git repo: With commit graphs: $ hyperfine --warmup 2 "git rev-list --objects --missing=print HEAD > /dev/null" Benchmark 1: git rev-list --objects --missing=print HEAD > /dev/null Time (mean ± σ): 1.336 s ± 0.013 s [User: 1.278 s, System: 0.054 s] Range (min … max): 1.323 s … 1.365 s 10 runs $ hyperfine --warmup 2 "git rev-list --missing=print HEAD > /dev/null" Benchmark 1: git rev-list --missing=print HEAD > /dev/null Time (mean ± σ): 29.6 ms ± 1.5 ms [User: 20.2 ms, System: 9.1 ms] Range (min … max): 27.5 ms … 35.1 ms 92 runs Without commit graphs: $ hyperfine --warmup 2 "git rev-list --objects --missing=print HEAD > /dev/null" Benchmark 1: git rev-list --objects --missing=print HEAD > /dev/null Time (mean ± σ): 1.735 s ± 0.022 s [User: 1.672 s, System: 0.057 s] Range (min … max): 1.703 s … 1.765 s 10 runs $ hyperfine --warmup 2 "git rev-list --missing=print HEAD > /dev/null" Benchmark 1: git rev-list --missing=print HEAD > /dev/null Time (mean ± σ): 426.6 ms ± 10.2 ms [User: 410.1 ms, System: 15.1 ms] Range (min … max): 414.9 ms … 441.3 ms 10 runs This shows that while there is definitely a performance loss when disabling commit-graphs. This loss is much lesser when used alongside the `--objects` flag. Overall, I lean towards the option "1", but okay with either. > Did you figure out what exactly the root cause of this functional > regression is? If so, can we address that root cause instead? Are you referring to `--missing` not working with commits? Looking at the history of `--missing` shows: - caf3827e2f: This commits introduces the `--missing` option, it basically states that for the upcoming partial clone mechanism, we want to add support for identifying missing objects. - 7c0fe330d5: This commit adds support for trees in `--missing`. It also goes on to state that `--missing` assumed only blobs could be missing. Both of these show that `--missing` started off support for only blobs and eventually added support to trees (similar to how we're adding support for commits here). So I guess the intention was to never work with commits. But the documentation and the option seem generic enough. Considering that trees was an extension, commits should be treated the same way.