Hi brian, On Sun, 14 Feb 2021, brian m. carlson wrote: > I'm currently working on the next step of the SHA-256 transition code, > which is SHA-256/SHA-1 interoperability. Essentially, when we write a > loose object into the store, or when we index a pack, we take one form > of the object, usually the SHA-256 form, and rewrite it so that it is in > its SHA-1 form, and then hash it to determine its SHA-1 name. We then > write this correspondence either into the loose object index (for loose > objects) or a v3 index (for packs). > > Blobs are simply hashed with both algorithms, but trees, commits, and > tags need to be rewritten to use the SHA-1 names of the objects they > refer to. For most situations, we already have this data, since it will > exist in the loose object index, in some pack index, or elsewhere in the > pack we're indexing. > > However, for submodules, we have a problem. By definition, the object > exists in a different repository. If we have the submodule locally on > the system, then this works fine, but if we're performing a fetch or > clone and the submodule is not present, then we cannot rewrite the tree > or anything that refers to it, directly or indirectly. > > So there are some possible courses of action: > > * Disallow compatibility algorithms when using submodules. This is > simple, but inconvenient. > * Force users to always clone submodules and fetch them before fetching > the main repository. This is also relatively simple, but > inconvenient. > * Have the remote server keep a list of correspondences and send them in > a protocol extension. > * Just skip rewriting objects until the data is filled in later and > admit the data will be incomplete. This means that pushing to or > pulling from a repository using a incompatible algorithm will be > impossible. > * Something else I haven't thought of. While my strong urge is to add "Remove support for submodules" (which BTW would also plug so many attack vectors that have lead to many a vulnerability in the past), I understand that this would be impractical: the figurative barn door has been open for way too long to do that. But I'd like to put another idea into the fray: store the mapping in `.gitmodules`. That is, each time `git submodule add <...>` is called, it would update `.gitmodules` to list SHA-1 *and* SHA-256 for the given path. That would relieve us of the problem where we rely on a server's ability to give us that mapping. Ciao, Dscho > The third option is where I'm leaning, but it has some potential > downsides. First, the server must support both hash algorithms and have > this data. Second, it essentially requires all submodule updates to be > pushed from a compatible client. Third, we need to trust that the > server hasn't tampered with the data, which should be possible by doing > an fsck on both forms (I think). Fourth, we need to store this > somewhere, and the only place we have right now is the loose object > index, which would potentially grow to inefficient sizes. > > We could potentially change this to be slightly different by asking the > submodule server for a list of correspondences instead via a new > protocol extension, but it has the same downsides except for the second > one, and additionally means that we'd need to make multiple connections. > > So I'm seeking some ideas on which approach we want to use here before > I start sinking a lot of work into this. > -- > brian m. carlson (he/him or they/them) > Houston, Texas, US >