Hi, On Fri, Dec 20, 2019 at 10:05:11PM +0000, Garima Singh via GitGitGadget wrote: > This series is intended to start the conversation and many of the commit > messages include specific call outs for suggestions and thoughts. Since it's mostly in RFC stage, I'm holding off on line by line comments for now. I read the series (thanks for your patience) and I'll try to leave some review thoughts in the diffspec here. > Garima Singh (9): > [1/9] commit-graph: add --changed-paths option to write I wonder if this can be combined with 2; without 2 actually the documentation is wrong for this one, right? Although I suppose you also mentioned 2 perhaps being too long :) > [2/9] commit-graph: write changed paths bloom filters As I understand it, this one always regenerates the bloom filter pieces, and doesn't write it down in the commit-graph file. How much longer does that take now than before? I don't have a great feel for how often 'git commit-graph' is run, or whether I need to be invoking it manually. > [3/9] commit-graph: use MAX_NUM_CHUNKS > [4/9] commit-graph: document bloom filter format I suppose I might like to see this commit squashed with 5, but it's a nit. I'm thinking it'd be handy to say "git blame commit-graph" and see some nice doc about the format expected in the commit-graph file. > [5/9] commit-graph: write changed path bloom filters to commit-graph file. Ah, so here we finally write down the result from 2 to disk in the commit-graph file. And without 7, this gets recalculated every time we call 'git commit-graph' still. As for a technical doc around here, I'd really appreciate one. But I'm speaking selfishly - I'd also be happy if I could watch a talk about this design to make sure I understand it right :) > [6/9] commit-graph: test commit-graph write --changed-paths > [7/9] commit-graph: reuse existing bloom filters during write. I saw an option to give up if there wasn't an existing bloom filter, but I didn't see an option here to force recalculating. Is there a scenario when that would be useful? What's the mitigation path if: - I have a commit-graph with v0 of the bloom index piece, but update to Git which uses v1? - My commit-graph file is corrupted in a way that the bloom filter results are incorrect and I am missing a blob change (and therefore not finding it during the walk)? I think I understand that without this commit, 8 is not much speedup because we will be recalculating the filter for each commit, rather than using the written-down commit-graph file. > [8/9] revision.c: use bloom filters to speed up path based revision walks > [9/9] commit-graph: add GIT_TEST_COMMIT_GRAPH_BLOOM_FILTERS test flag Speaking of making sure I understand the change right, I will also summarize my own understanding in the hopes that I can be corrected and others can learn too ;) - The general idea is that we can write down a hint at the tree level saying "something I own did change this commit"; when we look for an object later, we can skip the commit where it looks like that path didn't change. - The change is in two pieces: first, to generate the hints per tree (which happens during commit-graph); second, to use those hints to optimize a rev walk (which happens in revision.c patch 8) - When we calculate the hints during commit-graph, we check the diff of each tree compared to its recent ancestor to see if there was a change; if so we calculate a hash for each path and use that as a key for a map from hash to path. After we look through everything changed in the diff, we can add it to a cumulative bloom filter list (one filter per commit) so we have a handy in-memory idea of which paths changed in each commit. - When it's time to do the rev walk, we ask for the bloom filter for each commit and check if that commit's map contains the path to the object we're worried about; if so, then it's OK to unpack the tree and check if the path we are interested in actually did get changed during that commit. Thanks. - Emily > > Documentation/git-commit-graph.txt | 5 + > .../technical/commit-graph-format.txt | 17 ++ > Makefile | 1 + > bloom.c | 257 +++++++++++++++++ > bloom.h | 51 ++++ > builtin/commit-graph.c | 9 +- > ci/run-build-and-tests.sh | 1 + > commit-graph.c | 116 +++++++- > commit-graph.h | 9 +- > revision.c | 67 ++++- > revision.h | 5 + > t/README | 3 + > t/helper/test-read-graph.c | 4 + > t/t4216-log-bloom.sh | 77 ++++++ > t/t5318-commit-graph.sh | 2 + > t/t5324-split-commit-graph.sh | 1 + > t/t5325-commit-graph-bloom.sh | 258 ++++++++++++++++++ > 17 files changed, 875 insertions(+), 8 deletions(-) > create mode 100644 bloom.c > create mode 100644 bloom.h > create mode 100755 t/t4216-log-bloom.sh > create mode 100755 t/t5325-commit-graph-bloom.sh > > > base-commit: b02fd2accad4d48078671adf38fe5b5976d77304 > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-497%2Fgarimasi514%2FcoreGit-bloomFilters-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-497/garimasi514/coreGit-bloomFilters-v1 > Pull-Request: https://github.com/gitgitgadget/git/pull/497 > -- > gitgitgadget