On Wed, 18 Jul 2007, Junio C Hamano wrote: > > Another anchoring clue you seem not to be exploiting fully is > that the ASCII part must match "^[1-7][0-7]{4,5} " (mode bytes). I did that on purpose. The SHA1 *can* contain those characters too, so that's not really useful to us, and the only special character really is the NUL character (which is the only one cannot exists in the ASCII part - old-style trees can contain '/' too, although that's going away). Also, the mode bytes may not be visible: if we start in a long filename, we'll never have looked at the mode bytes, but if we see a NUL character after having seen 20 non-NUL characters (long filename), we already know we got it. So I don't think we can even usefully use the other knowledge of the format of the ASCII part (other than to know it doesn't contain NUL's). Of course, we can (and should) verify that the tree entry we find is valid, and *then* it makes sense to check the rules for the ASCII part. But that's only after we have already found the place. > I was suggesting to have a specialized parser only to read such > tree objects that are "abused" to represent notes. You can > cheaply validate that these trees are of expected shape. Sure. That said, I'm less interested in the notes than I am in the cost fo "git blame", and that could be optimized by having some special code in "tree_entry_interesting()" to find the tree entries using binary search. The special code would trigger only for: - large trees - "opt->nr_paths == 1" but the latter case is the one that matters for blame in the first place, so.. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html