Jonathan Tan <jonathantanmy@xxxxxxxxxx> writes: > So...how do we proceed? I can see at least 2 ways: > > - Decide that we're going to stick with the details of the existing > implementation and declare that "data" is always interpreted as signed. > In that case, I would put "signed" wherever necessary, rename the > function to something that is not "murmur3", and change the names of > byte1 etc. to indicate that they are not constrained to be less than or > equal to 0xff. > > - Bump the version number to 2 and correct the implementation to > match murmur3 (so, "data" is unsigned). Then we would have to think of > a transition plan. One possible one might be to always reject version > 1 bloom filters, which I'm personally OK with, but it may seem too > heavy a cost to some since in the perhaps typical case where a repo has > filenames restricted to 0x7f and below, the existing bloom filters are > still correct. If path filter hashing were merely advisory, in the sense that if a matching data is found, great, the processing goes faster, but if not, we would get correct results albeit not so quickly, a third option would be to just update the implementation without updating the version number. But we may not be so lucky---you must have seen a wrong result returned quickly, which is not what we want to see. But if I recall correctly we made the file format in such a way that bumping the version number is cheap in that transition can appear seamless. An updated implementation can just be told to _ignore_ old and possibly incorrect Bloom filters until it gets told to recompute, at which time it can write a correct one with a new version number. So I would prefer your "Bump the version number and ignore the old and possibly wrong data". Thanks.