ZheNing Hu <adlternative@xxxxxxxxx> writes: > Oh, you are right, this could be to prevent conflicts between Git objects > with identical content but different types. However, I always associate > Git with the file system, where metadata such as file type and size is > stored in the inode, while the file data is stored in separate chunks. I am afraid the presentation order Peff used caused a bit of confusion. The true reason is what Peff brought up as "Or worse". We need to be able to tell, given only the name of an object, everything that we need to know about the object, and for that, we need the type information when we ask for an object by its name. Having size embedded in the data that comes back to us when we consult object database with an object name helps the implementation to pre-allocate a buffer and then inflate into it--there is no fundamental reason why it should be there. It is a secondary problem created by the design choice that we store type together with contents, that the object type recorded in a tree entry may contradict the actual type of the object recorded in the tree entry. We could have declared that the object type found in a tree entry is to be trusted, if we didn't record the type in the object database together with the object contents. I think your original question was not "why do we store type and size together with the contents?", but was "why do we include in the hash computation?", and all of the above discuss related tangent without touching the original question. The need to have type or size available when we ask the object database for data associated with the object does not necessarily mean they must be hashed together with the contents. It was done merely because "why not? that way, we do not have to worry about catching corrupt values for type and size information we want to store together with the contents". IOW, we could have checksummed these two pieces of information separately, but why bother?