On Fri, Feb 07, 2020 at 10:36:58AM -0500, Derrick Stolee wrote: > On 2/7/2020 10:09 AM, Garima Singh wrote: > > > > On 2/7/2020 8:52 AM, SZEDER Gábor wrote: > >>> * Added unit tests for the bloom filter computation layer > >> > >> This fails on big endian, e.g. in Travis CI's s390x build: > >> > >> https://travis-ci.org/szeder/git-cooking-topics-for-travis-ci/jobs/647253022#L2210 > >> > >> (The link highlights the failure, but I'm afraid your browser won't > >> jump there right away; you'll have to click on the print-test-failures > >> fold at the bottom, and scroll down a bit...) > >> > > > > Thank you so much for running this pipeline and pointing out the error! > > > > We will carefully review our interactions with the binary data and > > hopefully solve this in the next version. > > Szeder, > > Thanks so much for running this test. We don't have access to a big endian > machine right now, so could you please apply this patch and re-run your tests? Unfortunately, it still failed: https://travis-ci.org/szeder/git-cooking-topics-for-travis-ci/jobs/647395554#L2204 > The issue is described in the message below, and Garima is working to ensure > the handling of the filter data is clarified in the next version. > > This is an issue from WAY back in the original prototype, and it highlights > that we've never been writing the data in network-byte order. This is completely > my fault. > > Thanks, > -Stolee > > > -->8-- > > From c1067db5d618b2dae430dfe373a11c771517da9e Mon Sep 17 00:00:00 2001 > From: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > Date: Fri, 7 Feb 2020 10:24:05 -0500 > Subject: [PATCH] fixup! bloom: core Bloom filter implementation for changed > paths > > The 'data' field of 'struct bloom_filter' can point to a memory location > (when computing one before writing to the commit-graph) or a memmap()'d > file location (when reading from the Bloom data chunk of the commit-graph > file). This means that the memory representation may be backwards in > Little Endian or Big Endian machines. > > Always write and read bits from 'filter->data' using network order. This > allows us to avoid loading the data streams from the file into memory > buffers. > > Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > --- > bloom.c | 6 ++++-- > t/helper/test-bloom.c | 2 +- > 2 files changed, 5 insertions(+), 3 deletions(-) > > diff --git a/bloom.c b/bloom.c > index 90d84dc713..aa6896584b 100644 > --- a/bloom.c > +++ b/bloom.c > @@ -124,8 +124,9 @@ void add_key_to_filter(struct bloom_key *key, > for (i = 0; i < settings->num_hashes; i++) { > uint64_t hash_mod = key->hashes[i] % mod; > uint64_t block_pos = hash_mod / BITS_PER_WORD; > + uint64_t bit = get_bitmask(hash_mod); > > - filter->data[block_pos] |= get_bitmask(hash_mod); > + filter->data[block_pos] |= htonll(bit); > } > } > > @@ -269,7 +270,8 @@ int bloom_filter_contains(struct bloom_filter *filter, > for (i = 0; i < settings->num_hashes; i++) { > uint64_t hash_mod = key->hashes[i] % mod; > uint64_t block_pos = hash_mod / BITS_PER_WORD; > - if (!(filter->data[block_pos] & get_bitmask(hash_mod))) > + uint64_t bit = get_bitmask(hash_mod); > + if (!(filter->data[block_pos] & htonll(bit))) > return 0; > } > > diff --git a/t/helper/test-bloom.c b/t/helper/test-bloom.c > index 9b4be97f75..09b2bb0a00 100644 > --- a/t/helper/test-bloom.c > +++ b/t/helper/test-bloom.c > @@ -23,7 +23,7 @@ static void print_bloom_filter(struct bloom_filter *filter) { > printf("Filter_Length:%d\n", filter->len); > printf("Filter_Data:"); > for (i = 0; i < filter->len; i++){ > - printf("%"PRIx64"|", filter->data[i]); > + printf("%"PRIx64"|", ntohll(filter->data[i])); > } > printf("\n"); > } > -- > 2.25.0.vfs.1.1.1.g9906319d24.dirty > > >