Re: [PATCH v5 1/4] gitformat-commit-graph: describe version 2 of BDAT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 13, 2023 at 02:42:08PM -0700, Jonathan Tan wrote:
> The code change to Git to support version 2 will be done in subsequent
> commits.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@xxxxxxxxxx>
> Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx>
> ---
>  Documentation/gitformat-commit-graph.txt | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/gitformat-commit-graph.txt b/Documentation/gitformat-commit-graph.txt
> index 31cad585e2..3e906e8030 100644
> --- a/Documentation/gitformat-commit-graph.txt
> +++ b/Documentation/gitformat-commit-graph.txt
> @@ -142,13 +142,16 @@ All multi-byte numbers are in network byte order.
>
>  ==== Bloom Filter Data (ID: {'B', 'D', 'A', 'T'}) [Optional]

This is a little beyond the scope of your series, but since we're
changing the on-disk format here a little bit, I think that it might be
worth it to consider whether there are any other changes that we'd like
to perform at the same time.

One that comes to mind is serializing the `max_changed_paths` value of
the Bloom filter settings, which is currently hard-coded as a constant,
c.f. 97ffa4fab50 (commit-graph.c: store maximum changed paths,
2020-09-17).

We always assume that the value there is 512, or the environment
variable GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS, if it is set. But it
might be nice to write it to disk, since it would allow us to do
something like:

--- 8< ---
diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
index fa9d32facfb..a42b0b03cfb 100755
--- a/t/t4216-log-bloom.sh
+++ b/t/t4216-log-bloom.sh
@@ -178,11 +178,12 @@ test_expect_success 'persist filter settings' '
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
 		GIT_TEST_BLOOM_SETTINGS_NUM_HASHES=9 \
 		GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY=15 \
+		GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=513 \
 		git commit-graph write --reachable --changed-paths &&
-	grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15,\"max_changed_paths\":512" trace2.txt &&
+	grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15,\"max_changed_paths\":513" trace2.txt &&
 	GIT_TRACE2_EVENT="$(pwd)/trace2-auto.txt" \
 		git commit-graph write --reachable --changed-paths &&
-	grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15,\"max_changed_paths\":512" trace2-auto.txt
+	grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15,\"max_changed_paths\":513" trace2-auto.txt
 '

 test_max_changed_paths () {
--- >8 ---

Which is currently not possible (the second grep assertion will fail,
since Git has no way to remember what the value of max_changed_paths is
from the existing commit-graph).

> +	in Probabilistic Verification". Version 1 Bloom filters have a bug that appears
> +	when char is signed and the repository has path names that have characters >=
> +	0x80; Git supports reading and writing them, but this ability will be removed
> +	in a future version of Git.

Makes sense.

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux