On 3/26/2018 5:00 PM, Jonathan Nieder wrote:
Jeff Hostetler wrote:
[long quote snipped]
While we are converting to a new hash function, it would be nice
if we could add a couple of fields to the end of the OID: the object
type and the raw uncompressed object size.
If would be nice if we could extend the OID to include 6 bytes of data
(4 or 8 bits for the type and the rest for the raw object size), and
just say that an OID is a {hash,type,size} tuple.
There are lots of places where we open an object to see what type it is
or how big it is. This requires uncompressing/undeltafying the object
(or at least decoding enough to get the header). In the case of missing
objects (partial clone or a gvfs-like projection) it requires either
dynamically fetching the object or asking an object-size-server for the
data.
All of these cases could be eliminated if the type/size were available
in the OID.
This implies a limit on the object size (e.g. 5 bytes in your
example). What happens when someone wants to encode an object larger
than that limit?
I could say add a full uint64 to the tail end of the hash, but
we currently don't handle blobs/objects larger then 4GB right now
anyway, right?
5 bytes for the size is just a compromise -- 1TB blobs would be
terrible to think about...
This also decreases the number of bits available for the hash, but
that shouldn't be a big issue.
I was suggesting extending the OIDs by 6 bytes while we are changing
the hash function.
Aside from those two, I don't see any downsides. It would mean that
tree objects contain information about the sizes of blobs contained
there, which helps with virtual file systems. It's also possible to
do that without putting the size in the object id, but maybe having it
in the object id is simpler.
Will think more about this.
Thanks for the idea,
Jonathan
Thanks
Jeff