On Fri, May 01, 2020 at 03:30:20PM +0000, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > > In Documentation/technical/commit-graph-format.txt, the definition > of the BIDX chunk specifies the length is a number of 8-byte words. > During development we discovered that using 8-byte words in the > Murmur3 hash algorithm causes issues with Big-Endian versus Little- > Endian machines. Thus, the hash algorithm was adapted to work on a I'm nit-picking here, but I believe that "Little Endian" is usually spelled "little endian", right? At least, 'git log --oneline -Gendian | wc -l' doesn't return many/any results for me, but '-GEndian' does. > byte-by-byte basis. However, this caused a change in the definition > of a "word" in bloom.h. Now, a "word" is a single byte, which allows > filters to be as small as two bytes. These length-two filters are > demonstrated in t0095-bloom.sh, and a larger filter of length 25 is > demonstrated as well. > > The original point of using 8-byte words was for alignment reasons. > It also presented opportunities for extremely sparse Bloom filters > when there were a small number of changes at a commit, creating a > very low false-positive rate. However, modifying the format at this > point is unlikely to be a valuable exercise. Also, this use of > single-byte granularity does present opportunities to save space. > It is unclear if 8-byte alignment of the filters would present any > meaningful performance benefits. > > Modify the format document to reflect reality. > > Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> The rest of this patch looks good, and indeed matches reality ;). Reviewed-by: Taylor Blau <me@xxxxxxxxxxxx> > --- > Documentation/technical/commit-graph-format.txt | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt > index de56f9f1efd..1beef171822 100644 > --- a/Documentation/technical/commit-graph-format.txt > +++ b/Documentation/technical/commit-graph-format.txt > @@ -97,10 +97,10 @@ CHUNK DATA: > bit on. The other bits correspond to the position of the last parent. > > Bloom Filter Index (ID: {'B', 'I', 'D', 'X'}) (N * 4 bytes) [Optional] > - * The ith entry, BIDX[i], stores the number of 8-byte word blocks in all > - Bloom filters from commit 0 to commit i (inclusive) in lexicographic > - order. The Bloom filter for the i-th commit spans from BIDX[i-1] to > - BIDX[i] (plus header length), where BIDX[-1] is 0. > + * The ith entry, BIDX[i], stores the number of bytes in all Bloom filters > + from commit 0 to commit i (inclusive) in lexicographic order. The Bloom > + filter for the i-th commit spans from BIDX[i-1] to BIDX[i] (plus header > + length), where BIDX[-1] is 0. > * The BIDX chunk is ignored if the BDAT chunk is not present. > > Bloom Filter Data (ID: {'B', 'D', 'A', 'T'}) [Optional] > -- > gitgitgadget Thanks, Taylor