I'm currently working on the next step of the SHA-256 transition code, which is SHA-256/SHA-1 interoperability. Essentially, when we write a loose object into the store, or when we index a pack, we take one form of the object, usually the SHA-256 form, and rewrite it so that it is in its SHA-1 form, and then hash it to determine its SHA-1 name. We then write this correspondence either into the loose object index (for loose objects) or a v3 index (for packs). Blobs are simply hashed with both algorithms, but trees, commits, and tags need to be rewritten to use the SHA-1 names of the objects they refer to. For most situations, we already have this data, since it will exist in the loose object index, in some pack index, or elsewhere in the pack we're indexing. However, for submodules, we have a problem. By definition, the object exists in a different repository. If we have the submodule locally on the system, then this works fine, but if we're performing a fetch or clone and the submodule is not present, then we cannot rewrite the tree or anything that refers to it, directly or indirectly. So there are some possible courses of action: * Disallow compatibility algorithms when using submodules. This is simple, but inconvenient. * Force users to always clone submodules and fetch them before fetching the main repository. This is also relatively simple, but inconvenient. * Have the remote server keep a list of correspondences and send them in a protocol extension. * Just skip rewriting objects until the data is filled in later and admit the data will be incomplete. This means that pushing to or pulling from a repository using a incompatible algorithm will be impossible. * Something else I haven't thought of. The third option is where I'm leaning, but it has some potential downsides. First, the server must support both hash algorithms and have this data. Second, it essentially requires all submodule updates to be pushed from a compatible client. Third, we need to trust that the server hasn't tampered with the data, which should be possible by doing an fsck on both forms (I think). Fourth, we need to store this somewhere, and the only place we have right now is the loose object index, which would potentially grow to inefficient sizes. We could potentially change this to be slightly different by asking the submodule server for a list of correspondences instead via a new protocol extension, but it has the same downsides except for the second one, and additionally means that we'd need to make multiple connections. So I'm seeking some ideas on which approach we want to use here before I start sinking a lot of work into this. -- brian m. carlson (he/him or they/them) Houston, Texas, US
Attachment:
signature.asc
Description: PGP signature