Re: Why is the name of a blob SHA1("$type $size\0$data") and not SHA1("$data")?

"Shawn O. Pearce" <spearce@xxxxxxxxxxx> · Thu, 30 Apr 2009 13:02:17 -0700

David Srbecky <dsrbecky@xxxxxxxxx> wrote:
>
> I started digging into the details and there is one thing that is really  
> bugging me - why is the name of a blob SHA1("$type $size\0$data") and  
> not SHA1("$data")?  I mean, wouldn't it be beautiful if the name of the  
> blob would really just be the SHA1 of the uncompressed file content? :-)

Well, a commit is stored in the same namespace as a blob (file
content).  So the type being included in the SHA1 computation helps
to break them apart and say "this is really a commit" vs. "this
is a file that just happens to have the same content as a commit".
It does help consistency checkers like `git fsck` to know that the
object is used in the right context.

I can't guess what Linus had in mind when he wrote Git, but I would
wager it was something along the lines that storing everything in
a single directory structure was simpler/more elegant than having
a different directory structure per object type.  Today I would
probably have made the same design decision, but I'm biased by
Git already so who knows if I'm just mimicing Linus' brilliance or
would have arrived at the same result myself.

Including the length is overkill, yes, but its in the header of the
data so that git can immediately allocate a properly sized memory
buffer before it inflates the rest of the object content.  Its a
performance improvement.  Its probably a historical accident that
it got included in the SHA1 computation, as notice its position
between the type and the data... it likely was just easier to
include it in the SHA1 than to exclude it.

> I would really appriciate some comments on the design decisions so that  
> I can sleep well at night :-)

Then I won't mention pack files... which aren't as simple to read
as just inflating a file on disk.  :-)

-- 
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html