"brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> writes: > On 2023-09-08 at 23:10:19, Eric W. Biederman wrote: >> Ir makes a lot of sense for the hash algorithm that determines how all > > Minor nit: "It". > >> diff --git a/Documentation/technical/hash-function-transition.txt b/Documentation/technical/hash-function-transition.txt >> index 4b937480848a..10572c5794f9 100644 >> --- a/Documentation/technical/hash-function-transition.txt >> +++ b/Documentation/technical/hash-function-transition.txt >> @@ -148,14 +148,14 @@ Detailed Design >> Repository format extension >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> A SHA-256 repository uses repository format version `1` (see >> -Documentation/technical/repository-version.txt) with extensions >> -`objectFormat` and `compatObjectFormat`: >> +Documentation/technical/repository-version.txt) with the extension >> +`objectFormat`, and an optional core.compatMap configuration. >> >> [core] >> repositoryFormatVersion = 1 >> + compatMap = on >> [extensions] >> objectFormat = sha256 >> - compatObjectFormat = sha1 > > While I'm in favour of an approach that uses the compat map, the > situation we've implemented here doesn't specify the extra hash > algorithm. We want this approach to work just as well for moving from > SHA-1 to SHA-256 as it might for a future transition from SHA-256 to, > say, SHA-3-512, if that becomes necessary. > > Making a future transition easier has been a goal of my SHA-256 work > (because who wants to write several hundred patches in such a case?), so > my hope is we can keep that here as well by explicitly naming the > algorithm we're using. > > I also wonder if an approach that doesn't use an extension is going to > be helpful. Say, that I have a repository that is using Git 3.x, which > supports interop, but I also need to use Git 2.x, which does not. While > it's true that Git 2.x can read my SHA-256 repository, it won't write > the appropriate objects into the map, and thus it will be practically > very difficult to actually use Git 3.x to push data to a repository of a > different hash function. We might well prefer to have Git 2.x not work > with the repository at all rather than have incomplete data preventing > us from, well, interoperating. First it is my hope that we can get a command such as "git gc" to scan the repository and fill in all of the missing compatibility hashes. Not so much for day to day work, but for people able to enable compatibility hashes on an existing repository. Enabling compatibility hashes on a sha1 repository is going to be necessary to create a sha256 repository from it. A depth first walk, or a topological sort of the objects pretty much has to happen as a separate pass. So it makes sense just to require all of the objects have their compatibility hash computed before attempting to generate a pack in the compatibility format. I say all of that and I feel silly. The core and optimized path is what whatever receive pack does to deal with a pack in the repositories compatibility format. Once that is built we can create a sha256 repository from a sha1 repository just by cloning it, and letting receive-pack figure out the details. Before we can generate a sha256 pack from a sha1 pack we still need to compute the sha256 hash of every object, but that can be very optimized and local to the case of receiving a non-native pack. So a repository that generates a compatibility hash for all of it's objects is not necessary to transition to another hash algorithm. All we need is another repository in the other format. That said there is value in being able to add compatibility hashes to an existing repository. The upstream repository can just convert to the new hash function and all of the downstream repositories can compute their compatibility hashes and convert when they are ready. Basically once a git with transition support exists any repository can convert at any time without creating a problem for other repositories. In my head it seems cheaper/safer to compute the compatibility hash of every object in an existing repository than it does to convert a repository. Is it? I think that if the first pull from a repository in another format can trigger the initial computation of the compatibility hash (like the first use of a reverse index triggers the creation of the reverse index), then it will definitely be easier to just enable compatibility hashes in an existing repository. The additional hash computation step every pull from upstream (even when well optimized) should be an incentive for people to fully convert their repositories after the upstream has converted. That is when things get tricky and the transition plan has not talked about. There are references to existing oid's in email, bug trackers, and commit comments. Digging through the history and dealing with those references is something that developers are going to need to do for the rest of the life of a project. Which means eventually we will need to support a mode where we have some packs with a ``.compat'' index but we no longer compute or generate the old hash for new objects. In summary. I agree that compatMap is likely insufficient. So far I think it is too cheap/easy to generate the missing mappings to make it a mandatory requirement that all operations always generate them. I also agree that making the configuration resilient foreseeable future demands is a good idea. So I will push this change farther out in the patch series. Eric