Re: [REVISED PATCH 2/6] Introduce commit notes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:

> And yes, the "search for zero bytes" is not *guaranteed* to find any 
> beginning at all, if you have lots of short names, *and* lots of zero 
> bytes in the SHA1's. But while short names may be common, zero bytes in 
> SHA1's are not so much (since you should expect to see a very even 
> distribution of bytes, and as such most SHA1's by far should have no zero 
> bytes at all!)
>
> So if you're really really *really* unlucky, you might end up having to 
> fall back on the linear search. But it still works!
>
> Can anybody see anything wrong in my thinking above?

Another anchoring clue you seem not to be exploiting fully is
that the ASCII part must match "^[1-7][0-7]{4,5} " (mode bytes).
But the real problem of this approach of course is that this is
not reliable and can get a false match.  You can find your
beginning NUL in the SHA-1 part of one entry, and terminating
NUL later in the SHA-1 part of next entry, and you will never
notice.

However, in the case of Dscho's "notes" code, I do not think (1)
you do not have to guess like the above, and (2) the problem is
much simpler.

Dcsho's "note" looks like a tree full of two-byte [0-9a-f]{2}
names, each of them points at another tree, with the second
level tree being full of 32-byte [0-9a-f]{38} names, each of
them points at a blob.  So it is a much more regular, strict
shape.  And in order to look for a note for an object whose name
is ([0-9a-f]{2})([0-9a-f]{38}), you will find the blob that is
at "$1/$2" in a "note".

I was suggesting to have a specialized parser only to read such
tree objects that are "abused" to represent notes.  You can
cheaply validate that these trees are of expected shape.

 (1) Validate that size of the toplevel tree is multiple of 29 =
     (5 + 1 + 2 + 1 + 20); the second level should be multiple
     of 66 = (6 + 1 + 38 + 1 + 20).  These two levels of trees
     are of fixed-entry-length that allows easy binary search.

 (2) While binary searching trees of either level, you can
     validate that the entry looks like from a note (for the
     toplevel, "40000 [0-9a-f]{2}\0", for the second level,
     "100644 [0-9a-f]{38}\0").

For an added safety, a "notes" writer could even throw in
signature bytes (say, a symlink whose name is " !" in the
top-level tree, and another symlink " !{37}" in the second-level
tree) to protect the reader.


-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux