Re: [PATCH 0/9] [RFC] Changed Paths Bloom Filters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Fri, Dec 20, 2019 at 10:05:11PM +0000, Garima Singh via GitGitGadget wrote:
> This series is intended to start the conversation and many of the commit
> messages include specific call outs for suggestions and thoughts. 

Since it's mostly in RFC stage, I'm holding off on line by line comments
for now. I read the series (thanks for your patience) and I'll try to
leave some review thoughts in the diffspec here.

> Garima Singh (9):
>   [1/9] commit-graph: add --changed-paths option to write
I wonder if this can be combined with 2; without 2 actually the
documentation is wrong for this one, right? Although I suppose you also
mentioned 2 perhaps being too long :)

>   [2/9] commit-graph: write changed paths bloom filters
As I understand it, this one always regenerates the bloom filter pieces,
and doesn't write it down in the commit-graph file. How much longer does
that take now than before? I don't have a great feel for how often 'git
commit-graph' is run, or whether I need to be invoking it manually.

>   [3/9] commit-graph: use MAX_NUM_CHUNKS
>   [4/9] commit-graph: document bloom filter format
I suppose I might like to see this commit squashed with 5, but it's a
nit. I'm thinking it'd be handy to say "git blame commit-graph" and see
some nice doc about the format expected in the commit-graph file.

>   [5/9] commit-graph: write changed path bloom filters to commit-graph file.
Ah, so here we finally write down the result from 2 to disk in the
commit-graph file. And without 7, this gets recalculated every time we
call 'git commit-graph' still.

As for a technical doc around here, I'd really appreciate one. But I'm
speaking selfishly - I'd also be happy if I could watch a talk about
this design to make sure I understand it right :)

>   [6/9] commit-graph: test commit-graph write --changed-paths
>   [7/9] commit-graph: reuse existing bloom filters during write.
I saw an option to give up if there wasn't an existing bloom filter, but
I didn't see an option here to force recalculating. Is there a scenario
when that would be useful? What's the mitigation path if:
 - I have a commit-graph with v0 of the bloom index piece, but update to
   Git which uses v1?
 - My commit-graph file is corrupted in a way that the bloom filter
   results are incorrect and I am missing a blob change (and therefore
   not finding it during the walk)?
I think I understand that without this commit, 8 is not much speedup
because we will be recalculating the filter for each commit, rather than
using the written-down commit-graph file.

>   [8/9] revision.c: use bloom filters to speed up path based revision walks
>   [9/9] commit-graph: add GIT_TEST_COMMIT_GRAPH_BLOOM_FILTERS test flag

Speaking of making sure I understand the change right, I will also
summarize my own understanding in the hopes that I can be corrected and
others can learn too ;)

 - The general idea is that we can write down a hint at the tree level
   saying "something I own did change this commit"; when we look for an
   object later, we can skip the commit where it looks like that path
   didn't change.
 - The change is in two pieces: first, to generate the hints per tree
   (which happens during commit-graph); second, to use those hints to
   optimize a rev walk (which happens in revision.c patch 8)
 - When we calculate the hints during commit-graph, we check the diff of
   each tree compared to its recent ancestor to see if there was a
   change; if so we calculate a hash for each path and use that as a key
   for a map from hash to path. After we look through everything changed
   in the diff, we can add it to a cumulative bloom filter list (one
   filter per commit) so we have a handy in-memory idea of which paths
   changed in each commit.
 - When it's time to do the rev walk, we ask for the bloom filter for
   each commit and check if that commit's map contains the path to the
   object we're worried about; if so, then it's OK to unpack the tree
   and check if the path we are interested in actually did get changed
   during that commit.

Thanks.
 - Emily

> 
>  Documentation/git-commit-graph.txt            |   5 +
>  .../technical/commit-graph-format.txt         |  17 ++
>  Makefile                                      |   1 +
>  bloom.c                                       | 257 +++++++++++++++++
>  bloom.h                                       |  51 ++++
>  builtin/commit-graph.c                        |   9 +-
>  ci/run-build-and-tests.sh                     |   1 +
>  commit-graph.c                                | 116 +++++++-
>  commit-graph.h                                |   9 +-
>  revision.c                                    |  67 ++++-
>  revision.h                                    |   5 +
>  t/README                                      |   3 +
>  t/helper/test-read-graph.c                    |   4 +
>  t/t4216-log-bloom.sh                          |  77 ++++++
>  t/t5318-commit-graph.sh                       |   2 +
>  t/t5324-split-commit-graph.sh                 |   1 +
>  t/t5325-commit-graph-bloom.sh                 | 258 ++++++++++++++++++
>  17 files changed, 875 insertions(+), 8 deletions(-)
>  create mode 100644 bloom.c
>  create mode 100644 bloom.h
>  create mode 100755 t/t4216-log-bloom.sh
>  create mode 100755 t/t5325-commit-graph-bloom.sh
> 
> 
> base-commit: b02fd2accad4d48078671adf38fe5b5976d77304
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-497%2Fgarimasi514%2FcoreGit-bloomFilters-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-497/garimasi514/coreGit-bloomFilters-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/497
> -- 
> gitgitgadget



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux