On Thu, 19 Jul 2007, Junio C Hamano wrote: > > ... > > But the real problem of this approach of course is that this is > > not reliable and can get a false match. You can find your > > beginning NUL in the SHA-1 part of one entry, and terminating > > NUL later in the SHA-1 part of next entry, and you will never > > notice. [ I didn't react to this in your first email, because I thought you were talking about your "use the rules for the ASCII part", and thought you talked about how *that* was not reliable and can get a false match). But it seems that you were actually talking about the NUL character test ] Nope, wrong. Why? Because there must always be a NUL *between* different SHA1's. There's *always* a NUL character that precedes a SHA1. So when you have two NUL characters (with no other NUL's between them), you *know* that they cannot be from two different SHA1's. If the first one was from an earlier SHA1, then the second one is *guaranteed* to be the one that happens *before* the next SHA1. See? You really have two, and only two cases: - NUL's that are within 20 bytes of each other: you don't know anything about them. It might be that they are both within the *same* SHA1, or the first one was the one that separated the ASCII part from the SHA1, or the first one was a NUL in the previous SHA1 and the second one was the NUL after the ASCII part. So two NUL's in the same 21-byte region are not reliable (ie less than 20 bytes in *between* them). They tell you nothing, and you must just ignore them. - NUL's that are more than 20 bytes apart: the second NUL is *guaranteed* to be the start of the next SHA1. They cannot be part of the same "NUL + sha1", and thus the first NUL *must* be from a previous SHA1 (or the NUL that preceded it). And that means that the second NUL *must* be the NUL that precedes the next SHA1. So there is *no* ambiguity what-so-ever. It's not about guessing, and it's not about "luck". If you don't find two NUL bytes separated by more than 20 bytes, you start the linear search. > In other words, if you are really really *really* unlucky, not > only you might end up being fooled by random byte sequences in > SHA-1 part of the tree object, you would not even notice that > you have to fall back on the linear search. Wrong. Either you find a guanteed rigth place, or you ran out of the buffer and know you have to fall back on the linear search. No fooled. > I've long time ago concluded that if we care about reliability > (and we do very much), a bisectable tree without breaking > backward compatibility is impossible. No. You concluded incorrectly. I'm pretty damn sure the current tree format is perfectly fine. It's dense, it's nice and linear, and it's easily bisectable. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html