Re: [silly] loose, pack, and another thing?

Jonathan Tan <jonathantanmy@xxxxxxxxxx> · Thu, 28 Sep 2023 14:40:10 -0700

Junio C Hamano <gitster@xxxxxxxxx> writes:
> Just wondering if it would help to have the third kind of object
> representation in the object database, sitting next to loose objects
> and packed objects, say .git/objects/verbatim/<hex-object-name> for
> the contents and .git/objects/verbatim/<hex-object-name>.type that
> records "blob", "tree", "commit", or "tag" (in practice, I would
> expect huge "blob" objects would be the only ones that use this
> mechanism).
> 
> The contents will be stored verbatim without compression and without
> any object header (i.e., the usual "<type> <length>\0") and the file
> could be "ln"ed (or "cow"ed if the underlying filesystem allows it)
> to materialize it in the working tree if needed.

This sounds like a useful feature. We probably would want to use the
"ln" or "cow" every time we use streaming (stream_blob_to_fd() in
streaming.h) currently, so hopefully we won't need to increase the
number of ways in which we can write an object to the worktree (just
change the streaming to write to a filename instead of an fd).

> "fsck" needs to be told about how to verify them.  Create the object
> header in-core and hash that, followed by the contents of that file,
> and make sure the result matches the <hex-object-name> part of the
> filename, or something like that.

Yeah, this sounds like what index-pack is doing - the hash algo can take
the contents of one buffer (a header that we synthesize ourselves), and
then take the contents of another buffer (the file contents).