Re: [Question] Can git cat-file have a type filtering option?

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 14 Apr 2023 08:58:39 -0700

ZheNing Hu <adlternative@xxxxxxxxx> writes:

> Oh, you are right, this could be to prevent conflicts between Git objects
> with identical content but different types. However, I always associate
> Git with the file system, where metadata such as file type and size is
> stored in the inode, while the file data is stored in separate chunks.

I am afraid the presentation order Peff used caused a bit of
confusion.  The true reason is what Peff brought up as "Or worse".
We need to be able to tell, given only the name of an object,
everything that we need to know about the object, and for that, we
need the type information when we ask for an object by its name.
Having size embedded in the data that comes back to us when we
consult object database with an object name helps the implementation
to pre-allocate a buffer and then inflate into it--there is no
fundamental reason why it should be there.

It is a secondary problem created by the design choice that we store
type together with contents, that the object type recorded in a tree
entry may contradict the actual type of the object recorded in the
tree entry.  We could have declared that the object type found in a
tree entry is to be trusted, if we didn't record the type in the
object database together with the object contents.

I think your original question was not "why do we store type and
size together with the contents?", but was "why do we include in the
hash computation?", and all of the above discuss related tangent
without touching the original question.

The need to have type or size available when we ask the object
database for data associated with the object does not necessarily
mean they must be hashed together with the contents.  It was done
merely because "why not? that way, we do not have to worry about
catching corrupt values for type and size information we want to
store together with the contents".  IOW, we could have checksummed
these two pieces of information separately, but why bother?