Taylor Blau <me@xxxxxxxxxxxx> writes: > diff --git a/revision.c b/revision.c > index 2f4c53ea20..1d36df49e2 100644 > --- a/revision.c > +++ b/revision.c > @@ -837,14 +837,24 @@ static int rev_compare_tree(struct rev_info *revs, > static int rev_same_tree_as_empty(struct rev_info *revs, struct commit *commit) > { > struct tree *t1 = repo_get_commit_tree(the_repository, commit); > + int bloom_ret = 1; > > if (!t1) > return 0; > > + if (revs->bloom_keys_nr) { > + bloom_ret = check_maybe_different_in_bloom_filter(revs, commit); > + if (!bloom_ret) > + return 1; > + } > + > tree_difference = REV_TREE_SAME; > revs->pruning.flags.has_changes = 0; > diff_tree_oid(NULL, &t1->object.oid, "", &revs->pruning); > > + if (bloom_ret == 1 && tree_difference == REV_TREE_SAME) > + count_bloom_filter_false_positive++; > + > return tree_difference == REV_TREE_SAME; > } I'll concentrate on getting this patch in, and will look at (and discuss) the other Bloom filter-related emails later. This looks good, possibly except a code path in try_to_simplify_commit() that calls this rev_same_tree_as_empty() function when rev_compare_tree() between a commit and its parent returns REV_TREE_NEW. So there are 2 issues: How can rev_compare_tree() ever return REV_TREE_NEW? And it doesn't seem right to check Bloom filters in this code path, since rev_same_tree_as_empty() was invoked here while we are enumerating through a commit's parents, which necessarily implies that the commit has parents, but here we're using the Bloom filter as if the commit is known to have no parents. As for the first issue, rev_compare_tree() returns REV_TREE_NEW when the parent's tree is NULL. I'm not sure how this can happen - the tree can be NULL if the parent commit is not parsed, but at this point I think that it has been parsed. And I think every commit has a tree. This goes back all the way to 3a5e860815 (revision: make tree comparison functions take commits rather than trees, 2008-11-03) and even beyond that (I didn't dig further). As for the second issue, we can probably solve this by being defensive in rev_same_tree_as_empty() by only using the Bloom filter when the commit has no parents. Not sure if this is being overly defensive, though. There is also the issue that count_bloom_filter_false_positive is incremented even when no Bloom filters are present, but I think this is fine (it matches the behavior of rev_compare_tree()).