David Srbecky <dsrbecky@xxxxxxxxx> wrote: > > I started digging into the details and there is one thing that is really > bugging me - why is the name of a blob SHA1("$type $size\0$data") and > not SHA1("$data")? I mean, wouldn't it be beautiful if the name of the > blob would really just be the SHA1 of the uncompressed file content? :-) Well, a commit is stored in the same namespace as a blob (file content). So the type being included in the SHA1 computation helps to break them apart and say "this is really a commit" vs. "this is a file that just happens to have the same content as a commit". It does help consistency checkers like `git fsck` to know that the object is used in the right context. I can't guess what Linus had in mind when he wrote Git, but I would wager it was something along the lines that storing everything in a single directory structure was simpler/more elegant than having a different directory structure per object type. Today I would probably have made the same design decision, but I'm biased by Git already so who knows if I'm just mimicing Linus' brilliance or would have arrived at the same result myself. Including the length is overkill, yes, but its in the header of the data so that git can immediately allocate a properly sized memory buffer before it inflates the rest of the object content. Its a performance improvement. Its probably a historical accident that it got included in the SHA1 computation, as notice its position between the type and the data... it likely was just easier to include it in the SHA1 than to exclude it. > I would really appriciate some comments on the design decisions so that > I can sleep well at night :-) Then I won't mention pack files... which aren't as simple to read as just inflating a file on disk. :-) -- Shawn. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html