"Eric W. Biederman" <ebiederm@xxxxxxxxx> writes: > As the code is written today index_bulk_checkin only accepts blobs. > Remove the enum object_type parameter and rename index_bulk_checkin to > index_blob_bulk_checkin, index_stream to index_blob_stream, > deflate_to_pack to deflate_blob_to_pack, stream_to_pack to > stream_blob_to_pack, to make this explicit. > > Not supporting commits, tags, or trees has no downside as it is not > currently supported now, and commits, tags, and trees being smaller by > design do not have the problem that the problem that index_bulk_checkin > was built to solve. > > Before we start adding code to support the hash function transition > supporting additional objects types in index_bulk_checkin has no real > additional cost, just an extra function parameter to know what the > object type is. Once we begin the hash function transition this is not > the case. > > The hash function transition document specifies that a repository with > compatObjectFormat enabled will compute and store both the SHA-1 and > SHA-256 hash of every object in the repository. > > What makes this a challenge is that it is not just an additional hash > over the same object. Instead the hash function transition document > specifies that the compatibility hash (specified with > compatObjectFormat) be computed over the equivalent object that another > git repository whose storage hash (specified with objectFormat) would > store. When comparing equivalent repositories built with different > storage hash functions, the oids embedded in objects used to refer to > other objects differ and the location of signatures within objects > differ. > > As blob objects have neither oids referring to other objects nor stored > signatures their storage hash and their compatibility hash are computed > over the same object. > > The other kinds of objects: trees, commits, and tags, all store oids > referring to other objects. Signatures are stored in commit and tag > objects. As oids and the tags to store signatures are not the same size > in repositories built with different storage hashes the size of the > equivalent objects are also different. > > A version of index_bulk_checkin that supports more than just blobs when > computing both the SHA-1 and the SHA-256 of every object added would > need a different, and more expensive structure. The structure is more > expensive because it would be required to temporarily buffering the > equivalent object the compatibility hash needs to be computed over. > > A temporary object is needed, because before a hash over an object can > computed it's object header needs to be computed. One of the members of > the object header is the entire size of the object. To know the size of > an equivalent object an entire pass over the original object needs to be > made, as trees, commits, and tags are composed of a variable number of > variable sized pieces. Unfortunately there is no formula to compute the > size of an equivalent object from just the size of the original object. > > Avoid all of those future complications by limiting index_bulk_checkin > to only work on blobs. Thanks. Will queue.