Re: [PATCH v2 00/11] Changed Paths Bloom Filters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/7/2020 10:09 AM, Garima Singh wrote:
> 
> On 2/7/2020 8:52 AM, SZEDER Gábor wrote:
>>>  * Added unit tests for the bloom filter computation layer
>>
>> This fails on big endian, e.g. in Travis CI's s390x build:
>>
>>   https://travis-ci.org/szeder/git-cooking-topics-for-travis-ci/jobs/647253022#L2210
>>
>> (The link highlights the failure, but I'm afraid your browser won't
>> jump there right away; you'll have to click on the print-test-failures
>> fold at the bottom, and scroll down a bit...)
>>
> 
> Thank you so much for running this pipeline and pointing out the error!
> 
> We will carefully review our interactions with the binary data and 
> hopefully solve this in the next version. 

Szeder,

Thanks so much for running this test. We don't have access to a big endian
machine right now, so could you please apply this patch and re-run your tests?

The issue is described in the message below, and Garima is working to ensure
the handling of the filter data is clarified in the next version.

This is an issue from WAY back in the original prototype, and it highlights
that we've never been writing the data in network-byte order. This is completely
my fault.

Thanks,
-Stolee


-->8--

>From c1067db5d618b2dae430dfe373a11c771517da9e Mon Sep 17 00:00:00 2001
From: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
Date: Fri, 7 Feb 2020 10:24:05 -0500
Subject: [PATCH] fixup! bloom: core Bloom filter implementation for changed
 paths

The 'data' field of 'struct bloom_filter' can point to a memory location
(when computing one before writing to the commit-graph) or a memmap()'d
file location (when reading from the Bloom data chunk of the commit-graph
file). This means that the memory representation may be backwards in
Little Endian or Big Endian machines.

Always write and read bits from 'filter->data' using network order. This
allows us to avoid loading the data streams from the file into memory
buffers.

Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
---
 bloom.c               | 6 ++++--
 t/helper/test-bloom.c | 2 +-
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/bloom.c b/bloom.c
index 90d84dc713..aa6896584b 100644
--- a/bloom.c
+++ b/bloom.c
@@ -124,8 +124,9 @@ void add_key_to_filter(struct bloom_key *key,
 	for (i = 0; i < settings->num_hashes; i++) {
 		uint64_t hash_mod = key->hashes[i] % mod;
 		uint64_t block_pos = hash_mod / BITS_PER_WORD;
+		uint64_t bit = get_bitmask(hash_mod);
 
-		filter->data[block_pos] |= get_bitmask(hash_mod);
+		filter->data[block_pos] |= htonll(bit);
 	}
 }
 
@@ -269,7 +270,8 @@ int bloom_filter_contains(struct bloom_filter *filter,
 	for (i = 0; i < settings->num_hashes; i++) {
 		uint64_t hash_mod = key->hashes[i] % mod;
 		uint64_t block_pos = hash_mod / BITS_PER_WORD;
-		if (!(filter->data[block_pos] & get_bitmask(hash_mod)))
+		uint64_t bit = get_bitmask(hash_mod);
+		if (!(filter->data[block_pos] & htonll(bit)))
 			return 0;
 	}
 
diff --git a/t/helper/test-bloom.c b/t/helper/test-bloom.c
index 9b4be97f75..09b2bb0a00 100644
--- a/t/helper/test-bloom.c
+++ b/t/helper/test-bloom.c
@@ -23,7 +23,7 @@ static void print_bloom_filter(struct bloom_filter *filter) {
 	printf("Filter_Length:%d\n", filter->len);
 	printf("Filter_Data:");
 	for (i = 0; i < filter->len; i++){
-		printf("%"PRIx64"|", filter->data[i]);
+		printf("%"PRIx64"|", ntohll(filter->data[i]));
 	}
 	printf("\n");
 }
-- 
2.25.0.vfs.1.1.1.g9906319d24.dirty






[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux