Re: [PATCH v4 1/4] gitformat-commit-graph: describe version 2 of BDAT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jonathan Tan <jonathantanmy@xxxxxxxxxx> writes:

> The code change to Git to support version 2 will be done in subsequent
> commits.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@xxxxxxxxxx>
> ---
>  Documentation/gitformat-commit-graph.txt | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/gitformat-commit-graph.txt b/Documentation/gitformat-commit-graph.txt
> index 31cad585e2..112e6d36a6 100644
> --- a/Documentation/gitformat-commit-graph.txt
> +++ b/Documentation/gitformat-commit-graph.txt
> @@ -142,13 +142,16 @@ All multi-byte numbers are in network byte order.
>  
>  ==== Bloom Filter Data (ID: {'B', 'D', 'A', 'T'}) [Optional]
>      * It starts with header consisting of three unsigned 32-bit integers:
> -      - Version of the hash algorithm being used. We currently only support
> -	value 1 which corresponds to the 32-bit version of the murmur3 hash
> +      - Version of the hash algorithm being used. We currently support
> +	value 2 which corresponds to the 32-bit version of the murmur3 hash
>  	implemented exactly as described in
>  	https://en.wikipedia.org/wiki/MurmurHash#Algorithm and the double
>  	hashing technique using seed values 0x293ae76f and 0x7e646e2 as
>  	described in https://doi.org/10.1007/978-3-540-30494-4_26 "Bloom Filters
> -	in Probabilistic Verification"
> +	in Probabilistic Verification". Version 1 bloom filters have a bug that appears

"bloom" -> "Bloom", probably, as the name comes from the name of its
inventor (just like we spell "Boolean", not "boolean").

> +	when char is signed and the repository has path names that have characters >=
> +	0x80; Git supports reading and writing them, but this ability will be removed
> +	in a future version of Git.

Makes sense.

I wonder if we want to mention what the undesired misbehaviour the
"bug" causes and what we do to avoid getting affected by the bug
here.  If we can say something like "When querying for a pathname
with a byte with high-bit set, the buggy filter may produce false
negative, making the filter unusable, but asking for a pathname
without such a byte produces no false negatives (even though we may
get false positives).  When Git reads version 1 filter data, it
refrains from using it for processing paths with high-bit set to
avoid triggering the bug", then it would be ideal.  Or "When the
repository has even a single pathname with high-bit set anywhere in
its history, version 1 Bloom can give false negative when querying
any paths and becomes unusable.  You can use $THIS configuration
variable to disable use of Bloom filter data in such a case" would
also be fine.  The point is to give actionable piece of information
to the readers.

Thanks.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux