(administrivia: please omit parts of the text you are replying to that are not relevant to the reply. This makes it easier to see what you're replying to, especially in mail readers that don't hide quoted text by the default) Hi Jeff, Jeff Hostetler wrote: [long quote snipped] > While we are converting to a new hash function, it would be nice > if we could add a couple of fields to the end of the OID: the object > type and the raw uncompressed object size. > > If would be nice if we could extend the OID to include 6 bytes of data > (4 or 8 bits for the type and the rest for the raw object size), and > just say that an OID is a {hash,type,size} tuple. > > There are lots of places where we open an object to see what type it is > or how big it is. This requires uncompressing/undeltafying the object > (or at least decoding enough to get the header). In the case of missing > objects (partial clone or a gvfs-like projection) it requires either > dynamically fetching the object or asking an object-size-server for the > data. > > All of these cases could be eliminated if the type/size were available > in the OID. This implies a limit on the object size (e.g. 5 bytes in your example). What happens when someone wants to encode an object larger than that limit? This also decreases the number of bits available for the hash, but that shouldn't be a big issue. Aside from those two, I don't see any downsides. It would mean that tree objects contain information about the sizes of blobs contained there, which helps with virtual file systems. It's also possible to do that without putting the size in the object id, but maybe having it in the object id is simpler. Will think more about this. Thanks for the idea, Jonathan