Junio C Hamano <gitster@xxxxxxxxx> writes: > Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes: > ... >> So if you're really really *really* unlucky, you might end up having to >> fall back on the linear search. But it still works! >> >> Can anybody see anything wrong in my thinking above? > ... > But the real problem of this approach of course is that this is > not reliable and can get a false match. You can find your > beginning NUL in the SHA-1 part of one entry, and terminating > NUL later in the SHA-1 part of next entry, and you will never > notice. In other words, if you are really really *really* unlucky, not only you might end up being fooled by random byte sequences in SHA-1 part of the tree object, you would not even notice that you have to fall back on the linear search. I've long time ago concluded that if we care about reliability (and we do very much), a bisectable tree without breaking backward compatibility is impossible. I was hoping to find a "hole" in tree object format so that I can place an extended section that is invisible to older versions of git, and place a table that records offsets of each tree entries to help bisection and/or perhaps a hash table to help look-up, but I do not think it is possible. In the case of index file, the original file format had a hole after the cache-entry array where we can later squeeze an extension section that is invisible to older versions of git. But the tree object format is designed so tight that I do not see there is any place to put an extension section. Side note: I also think adding "extension section" to tree object is not a good idea to begin with. The data nor length of such a section cannot participate in hash computation to derive the tree's object name so that we can still compare two tree objects (with and without such extension) that have the same contents by only looking at their object names. But having contents that are not counted as parts of the object's name goes against the reliability and safety of git. > ... > I was suggesting to have a specialized parser only to read such > tree objects that are "abused" to represent notes. You can > cheaply validate that these trees are of expected shape. > ... > For an added safety, a "notes" writer could even throw in > signature bytes (say, a symlink whose name is " !" in the > top-level tree, and another symlink " !{37}" in the second-level > tree) to protect the reader. Of course, even with the above trick with relatively cheap validation based on size, entry format, and "signature entries", the way I outlined to speed up "notes" access really relies on the tree objects used in "notes" to be well formed. If somebody throws in a tree that is not really a "note" to refs/notes/, and if I am really really *really* unlucky, not only I might end up being fooled by random byte sequences in SHA-1 part of the tree object, I would not even notice that I am reading garbage and end up giving garbage as "note" to the object back to the user. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html