On Wed, Sep 16, 2020 at 02:08:10PM -0400, Taylor Blau wrote: > Introduce a command-line flag to specify the maximum number of new Bloom > filters that a 'git commit-graph write' is willing to compute from > scratch. > > Prior to this patch, a commit-graph write with '--changed-paths' would > compute Bloom filters for all selected commits which haven't already > been computed (i.e., by a previous commit-graph write with '--split' > such that a roll-up or replacement is performed). > > This behavior can cause prohibitively-long commit-graph writes for a > variety of reasons: > > * There may be lots of filters whose diffs take a long time to > generate (for example, they have close to the maximum number of > changes, diffing itself takes a long time, etc). > > * Old-style commit-graphs (which encode filters with too many entries > as not having been computed at all) cause us to waste time > recomputing filters that appear to have not been computed only to > discover that they are too-large. > > This can make the upper-bound of the time it takes for 'git commit-graph > write --changed-paths' to be rather unpredictable. > > To make this command behave more predictably, introduce > '--max-new-filters=<n>' to allow computing at most '<n>' Bloom filters > from scratch. This lets "computing" already-known filters proceed > quickly, while bounding the number of slow tasks that Git is willing to > do. > > Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> > --- > Documentation/git-commit-graph.txt | 5 +++ > bloom.c | 7 ++- > builtin/commit-graph.c | 27 +++++++++++- > commit-graph.c | 9 +++- > commit-graph.h | 1 + > t/t4216-log-bloom.sh | 70 ++++++++++++++++++++++++++++++ > 6 files changed, 111 insertions(+), 8 deletions(-) > > diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt > index 17405c73a9..8357846d30 100644 > --- a/Documentation/git-commit-graph.txt > +++ b/Documentation/git-commit-graph.txt > @@ -67,6 +67,11 @@ this option is given, future commit-graph writes will automatically assume > that this option was intended. Use `--no-changed-paths` to stop storing this > data. > + > +With the `--max-new-filters=<n>` option, generate at most `n` new Bloom > +filters (if `--changed-paths` is specified). If '--max-new-filters=42' is specified without '--changed-paths', then it is silently ignored instead of erroring out. Is that intentional? > If `n` is `-1`, no limit is > +enforced. Commits whose filters are not calculated are stored as a > +length zero Bloom filter. This last sentence goes into implementation details of the file format, so I don't think it should be included here. What should be documented here instead is what happens later with those commits for which no modified path Bloom filters were computed. Will they ever be computed? If yes, which commands will compute them (even implicitly), and thus will run slower? What command(s) should the users run if they want to compute the missing filters? Let's suppose that running git commit-graph write --reachable --changed-paths --max-new-filters=<L> --split writes a commit-graph layer containing N1 commits, and N1 > L, so N1-L commits won't have an associated modified path Bloom filter. After creating and fetching new commits, this command is executed again to add information about N2 commits to the commit-graph, and N2 < L. - If this second invocation writes a new commit-graph layer, then it will only compute Bloom filters for commits in this new layer, even though it still has some "Bloom-computing-capacity" left. (This will happen even if this second write were invoked without '--max-new-filters'.) - If the merge strategy decides that, instead of writing a new layer, the new N2 commits should be merged with the layer containing the previous N1 commits, then it will compute Bloom filters for some or all of those N1 commits without filters, depending on whether N2-L is smaller or larger than N1-L. Is this how it is supposed to work? We can't tell, because neither the commit messages nor the documentation updates talk about this. > ++ > With the `--split[=<strategy>]` option, write the commit-graph as a > chain of multiple commit-graph files stored in > `<dir>/info/commit-graphs`. Commit-graph layers are merged based on the