Re: [REVISED PATCH 2/6] Introduce commit notes

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 19 Jul 2007 02:34:00 -0700

Junio C Hamano <gitster@xxxxxxxxx> writes:

> Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:
> ...
>> So if you're really really *really* unlucky, you might end up having to 
>> fall back on the linear search. But it still works!
>>
>> Can anybody see anything wrong in my thinking above?
> ...
> But the real problem of this approach of course is that this is
> not reliable and can get a false match.  You can find your
> beginning NUL in the SHA-1 part of one entry, and terminating
> NUL later in the SHA-1 part of next entry, and you will never
> notice.

In other words, if you are really really *really* unlucky, not
only you might end up being fooled by random byte sequences in
SHA-1 part of the tree object, you would not even notice that
you have to fall back on the linear search.

I've long time ago concluded that if we care about reliability
(and we do very much), a bisectable tree without breaking
backward compatibility is impossible.  I was hoping to find a
"hole" in tree object format so that I can place an extended
section that is invisible to older versions of git, and place a
table that records offsets of each tree entries to help
bisection and/or perhaps a hash table to help look-up, but I do
not think it is possible.  In the case of index file, the
original file format had a hole after the cache-entry array
where we can later squeeze an extension section that is
invisible to older versions of git.  But the tree object format
is designed so tight that I do not see there is any place to put
an extension section.

Side note: I also think adding "extension section" to tree
object is not a good idea to begin with.  The data nor length of
such a section cannot participate in hash computation to derive
the tree's object name so that we can still compare two tree
objects (with and without such extension) that have the same
contents by only looking at their object names.  But having
contents that are not counted as parts of the object's name goes
against the reliability and safety of git.

> ...
> I was suggesting to have a specialized parser only to read such
> tree objects that are "abused" to represent notes.  You can
> cheaply validate that these trees are of expected shape.
> ...
> For an added safety, a "notes" writer could even throw in
> signature bytes (say, a symlink whose name is " !" in the
> top-level tree, and another symlink " !{37}" in the second-level
> tree) to protect the reader.

Of course, even with the above trick with relatively cheap
validation based on size, entry format, and "signature entries",
the way I outlined to speed up "notes" access really relies on
the tree objects used in "notes" to be well formed.  If somebody
throws in a tree that is not really a "note" to refs/notes/, and
if I am really really *really* unlucky, not only I might end up
being fooled by random byte sequences in SHA-1 part of the tree
object, I would not even notice that I am reading garbage and
end up giving garbage as "note" to the object back to the user.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html