I did my own experiments on top of what Szeder provided - the first patch is to have one fixed-size bloom filter per commit, and the second patch makes that bloom filter apply to only the first parent of each commit. The results are: Original (szeder's) $ GIT_USE_POC_BLOOM_FILTER=$((8*1024*1024*8)) time ./git commit-graph write 0:10.28 $ ls -l .git/objects/info/bloom 8388616 $ GIT_TRACE_BLOOM_FILTER=2 GIT_USE_POC_BLOOM_FILTER=y time ./git -c \ core.commitgraph=true rev-list --count --full-history HEAD -- \ t/valgrind/valgrind.sh 886 bloom filter total queries: 66459 definitely not: 65276 maybe: 1183 false positives: 297 fp ratio: 0.004469 0:00.24 With patch 1 $ GIT_USE_POC_BLOOM_FILTER=256 time ./git commit-graph write 0:16.22 $ ls -l .git/objects/info/bloom 1832620 $ GIT_TRACE_BLOOM_FILTER=2 GIT_USE_POC_BLOOM_FILTER=y time ./git -c \ core.commitgraph=true rev-list --count --full-history HEAD -- \ t/valgrind/valgrind.sh 886 bloom filter total queries: 66459 definitely not: 46637 maybe: 19822 false positives: 18936 fp ratio: 0.284928 0:01.53 With patch 2 $ GIT_USE_POC_BLOOM_FILTER=256 time ./git commit-graph write 0:06.70 $ ls -l .git/objects/info/bloom 1832620 $ GIT_TRACE_BLOOM_FILTER=2 GIT_USE_POC_BLOOM_FILTER=y time ./git -c \ core.commitgraph=true rev-list --count --full-history HEAD -- \ t/valgrind/valgrind.sh 886 bloom filter total queries: 53096 definitely not: 52989 maybe: 107 false positives: 89 fp ratio: 0.001676 0:01.29 For comparison, a non-GIT_USE_POC_BLOOM_FILTER rev-list takes 3.517 seconds. I haven't investigated why patch 1 takes longer than the original to create the bloom filter. Using per-commit filters and restricting the bloom filter to a single parent increases the relative power of the filter in omitting tree inspections compared to the original (107/53096 vs 1183/66459), but the lack of coverage w.r.t. the non-first parents had a more significant effect than I thought (1.29s vs .24s). It might be best to have one filter for each (commit, parent) pair (or, at least, the first two parents of each commit - we probably don't need to care that much about octopus merges) - this would take up more disk space than if we only store filters for the first parent, but is still less than the original example of storing information for all commits in one filter. There are more possibilities like dynamic filter sizing, different hashing, and hashing to support wildcard matches, which I haven't looked into. Jonathan Tan (2): One filter per commit Only make bloom filter for first parent bloom-filter.c | 31 ++++++++++++++++++------------- bloom-filter.h | 12 ++++-------- commit-graph.c | 30 ++++++++++++++---------------- revision.c | 29 +++++++++++++++-------------- 4 files changed, 51 insertions(+), 51 deletions(-) -- 2.19.0.271.gfe8321ec05.dirty