Re: Changed path filter hash differs from murmur3 if char is signed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jonathan Tan <jonathantanmy@xxxxxxxxxx> writes:

> Yes - if the bloom filter contained junk data (in our example, created
> using a different hash function on filenames that have characters that
> exceed 0x7f), the bloom filter would report "no, this commit does not
> contain a change in such-and-such path" and then we would skip the
> commit, even if the commit did have a change in that path.

Just to help my understanding (read: I am not suggesting this as one
of the holes to exploit to help a smooth transition), does the above
mean that, as long as the path we are asking about does not have a
byte with the high-bit set, we would be OK, even if the Bloom filter
were constructed with a bad function and there were other paths that
had such a byte?

> I don't have statistics on this, but if the majority of repos have
> only <=0x7f filenames (which seems reasonable to me), this might save
> sufficient work that we can proceed with bumping the version number and
> ignoring old data.
>
>> Better yet, we should be able to reuse existing Bloom filter data for
>> paths that have all characters <=0xff, and only recompute them where

"ff" -> "7f" I presume?

>> necessary. That makes much more sense than the previous paragraph.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux