Re: [PATCH 03/12] Documentation: changed-path Bloom filters use byte words

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, May 01, 2020 at 03:30:20PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
>
> In Documentation/technical/commit-graph-format.txt, the definition
> of the BIDX chunk specifies the length is a number of 8-byte words.
> During development we discovered that using 8-byte words in the
> Murmur3 hash algorithm causes issues with Big-Endian versus Little-
> Endian machines. Thus, the hash algorithm was adapted to work on a

I'm nit-picking here, but I believe that "Little Endian" is usually
spelled "little endian", right?

At least, 'git log --oneline -Gendian | wc -l' doesn't return many/any
results for me, but '-GEndian' does.

> byte-by-byte basis. However, this caused a change in the definition
> of a "word" in bloom.h. Now, a "word" is a single byte, which allows
> filters to be as small as two bytes. These length-two filters are
> demonstrated in t0095-bloom.sh, and a larger filter of length 25 is
> demonstrated as well.
>
> The original point of using 8-byte words was for alignment reasons.
> It also presented opportunities for extremely sparse Bloom filters
> when there were a small number of changes at a commit, creating a
> very low false-positive rate. However, modifying the format at this
> point is unlikely to be a valuable exercise. Also, this use of
> single-byte granularity does present opportunities to save space.
> It is unclear if 8-byte alignment of the filters would present any
> meaningful performance benefits.
>
> Modify the format document to reflect reality.
>
> Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx>

The rest of this patch looks good, and indeed matches reality ;).

  Reviewed-by: Taylor Blau <me@xxxxxxxxxxxx>

> ---
>  Documentation/technical/commit-graph-format.txt | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
> index de56f9f1efd..1beef171822 100644
> --- a/Documentation/technical/commit-graph-format.txt
> +++ b/Documentation/technical/commit-graph-format.txt
> @@ -97,10 +97,10 @@ CHUNK DATA:
>        bit on. The other bits correspond to the position of the last parent.
>
>    Bloom Filter Index (ID: {'B', 'I', 'D', 'X'}) (N * 4 bytes) [Optional]
> -    * The ith entry, BIDX[i], stores the number of 8-byte word blocks in all
> -      Bloom filters from commit 0 to commit i (inclusive) in lexicographic
> -      order. The Bloom filter for the i-th commit spans from BIDX[i-1] to
> -      BIDX[i] (plus header length), where BIDX[-1] is 0.
> +    * The ith entry, BIDX[i], stores the number of bytes in all Bloom filters
> +      from commit 0 to commit i (inclusive) in lexicographic order. The Bloom
> +      filter for the i-th commit spans from BIDX[i-1] to BIDX[i] (plus header
> +      length), where BIDX[-1] is 0.
>      * The BIDX chunk is ignored if the BDAT chunk is not present.
>
>    Bloom Filter Data (ID: {'B', 'D', 'A', 'T'}) [Optional]
> --
> gitgitgadget

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux