Re: Changed path filter hash differs from murmur3 if char is signed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Junio C Hamano <gitster@xxxxxxxxx> writes:
> Jonathan Tan <jonathantanmy@xxxxxxxxxx> writes:
> 
> > Yes - if the bloom filter contained junk data (in our example, created
> > using a different hash function on filenames that have characters that
> > exceed 0x7f), the bloom filter would report "no, this commit does not
> > contain a change in such-and-such path" and then we would skip the
> > commit, even if the commit did have a change in that path.
> 
> Just to help my understanding (read: I am not suggesting this as one
> of the holes to exploit to help a smooth transition), does the above
> mean that, as long as the path we are asking about does not have a
> byte with the high-bit set, we would be OK, even if the Bloom filter
> were constructed with a bad function and there were other paths that
> had such a byte?

Ah, thanks for asking. Yes, the false negative I describe above only
happens when the path we're querying for contains a character >0x7f (so
if there is no byte with the high-bit set, it is still OK).

> > I don't have statistics on this, but if the majority of repos have
> > only <=0x7f filenames (which seems reasonable to me), this might save
> > sufficient work that we can proceed with bumping the version number and
> > ignoring old data.
> >
> >> Better yet, we should be able to reuse existing Bloom filter data for
> >> paths that have all characters <=0xff, and only recompute them where
> 
> "ff" -> "7f" I presume?

That was my assumption too, but Taylor can clarify.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux