[RFC PATCH] Not computing changed path filter for root commits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is following the discussion about adding a new changed path filter
version due to the current implementation of murmur3 not matching the
algorithm. [1]

SZEDER Gábor suggested [2] that we change the revision walk to read
changed path filters also for root commits, but I don't think that's
possible - we have to tie reading changed path filters to when we read
trees, and right now, we don't seem to read trees when evaluating root
commits (rev_compare_tree() in revision.c is in the only code path that
uses changed path filters, and it itself is only called per-parent and
thus not called for root commits). The alternative is to not generate
changed path filters for root commits (or what I did in this patch,
which is to generate an all-1 filter), which seems reasonable to me.

Is this a good idea? If yes, there is the follow-up question of how to
report it in traces. I don't know off-hand if it's better to reuse the
"large" statistic, since what we output is the same (all 1s), or if we
should make a new one. Presumably what we need to weigh is the clarity
versus the migration costs, but I don't know how to weigh it.

If people are generally in agreement, I can send an updated patch that
does not goto into an "else" block :) and also with updated tests (t0095
needs to check a non-root commit, and t4216 needs to be updated with
whatever statistics we decide to use).

[1] https://lore.kernel.org/git/20230826150610.GA1928@xxxxxxxxxx/
[2] https://lore.kernel.org/git/20230830200218.GA5147@xxxxxxxxxx/
---
 bloom.c              |  3 ++-
 t/t0095-bloom.sh     |  2 +-
 t/t4216-log-bloom.sh | 12 +++---------
 3 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/bloom.c b/bloom.c
index aef6b5fea2..b21130b236 100644
--- a/bloom.c
+++ b/bloom.c
@@ -226,7 +226,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
 	if (c->parents)
 		diff_tree_oid(&c->parents->item->object.oid, &c->object.oid, "", &diffopt);
 	else
-		diff_tree_oid(NULL, &c->object.oid, "", &diffopt);
+		goto large;
 	diffcore_std(&diffopt);
 
 	if (diff_queued_diff.nr <= settings->max_changed_paths) {
@@ -292,6 +292,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
 	} else {
 		for (i = 0; i < diff_queued_diff.nr; i++)
 			diff_free_filepair(diff_queued_diff.queue[i]);
+large:
 		init_truncated_large_filter(filter);
 
 		if (computed)
diff --git a/t/t0095-bloom.sh b/t/t0095-bloom.sh
index b567383eb8..02a0b41026 100755
--- a/t/t0095-bloom.sh
+++ b/t/t0095-bloom.sh
@@ -74,7 +74,7 @@ test_expect_success !SANITIZE_LEAK 'get bloom filters for commit with no changes
 	git commit --allow-empty -m "c0" &&
 	cat >expect <<-\EOF &&
 	Filter_Length:1
-	Filter_Data:00|
+	Filter_Data:ff|
 	EOF
 	test-tool bloom get_filter_for_commit "$(git rev-parse HEAD)" >actual &&
 	test_cmp expect actual
diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
index fa9d32facf..d14fe93fb1 100755
--- a/t/t4216-log-bloom.sh
+++ b/t/t4216-log-bloom.sh
@@ -283,7 +283,6 @@ test_expect_success 'correctly report changes over limit' '
 			git commit-graph write --reachable --changed-paths &&
 		test_max_changed_paths 11 trace-update &&
 		test_filter_computed 2 trace-update &&
-		test_filter_trunc_large 0 trace-update &&
 
 		for path in $(git ls-tree -r --name-only HEAD)
 		do
@@ -306,9 +305,7 @@ test_expect_success 'correctly report commits with no changed paths' '
 		GIT_TRACE2_EVENT="$(pwd)/trace.event" \
 			git commit-graph write --reachable --changed-paths &&
 		test_filter_computed 1 trace.event &&
-		test_filter_not_computed 0 trace.event &&
-		test_filter_trunc_empty 1 trace.event &&
-		test_filter_trunc_large 0 trace.event
+		test_filter_not_computed 0 trace.event
 	)
 '
 
@@ -363,8 +360,7 @@ test_expect_success '--max-new-filters overrides configuration' '
 				--max-new-filters=1 &&
 		test_filter_computed 1 trace.event &&
 		test_filter_not_computed 1 trace.event &&
-		test_filter_trunc_empty 0 trace.event &&
-		test_filter_trunc_large 0 trace.event
+		test_filter_trunc_empty 0 trace.event
 	)
 '
 
@@ -386,9 +382,7 @@ test_expect_success 'Bloom generation backfills empty commits' '
 				git commit-graph write --reachable \
 					--changed-paths --max-new-filters=2 &&
 			test_filter_computed 2 trace.event &&
-			test_filter_not_computed 4 trace.event &&
-			test_filter_trunc_empty 2 trace.event &&
-			test_filter_trunc_large 0 trace.event || return 1
+			test_filter_not_computed 4 trace.event || return 1
 		done &&
 
 		# Finally, make sure that once all commits have filters, that
-- 
2.42.0.283.g2d96d420d3-goog





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux