Ian Jackson <ijackson@xxxxxxxxxxxxxxxxxxxxxx> writes: > I have been thinking about how to do a transition from SHA1 to another > hash function. Good. I think many of us have also been, too, not necessarily just in the past few days in response to shattered, but over the last 10 years, yet without coming to a consensus design ;-) > I have concluded that: > > * We can should avoid expecting everyone to rewrite all their > history. Yes. > * Unfortunately, because the data formats (particularly, the commit > header) are not in practice extensible (because of the way existing > code parses them), it is not useful to try generate new data (new > commits etc.) containing both new hashes and old hashes: old > clients will mishandle the new data. Yes. > * Therefore the transition needs to be done by giving every object > two names (old and new hash function). Objects may refer to each > other by either name, but must pick one. The usual shape of I do not think it is necessrily so. Existing code may not be able to read anything new, but you can make the new code understand object names in both formats, and for a smooth transition, I think the new code needs to. For example, a new commit that records a merge of an old and a new commit whose resulting tree happens to be the same as the tree of the old commit may begin like so: tree 21b97d4c4f968d1335f16292f954dfdbb91353f0 parent 20769079d22a9f8010232bdf6131918c33a1bf6910232bdf6131918c33a1bf69 parent 22af6fef9b6538c9e87e147a920be9509acf1ddd naming the only object whose name was done with new hash with the new longer hash, while recording the names of the other existing objects with SHA-1. We would need to extend the object format for tag (which would be trivial as the object reference is textual and similar to a commit) and tree (much harder), of course. As long as the reader can tell from the format of object names stored in the "new object format" object from what era is being referred to in some way [*1*], we can name new objects with only new hash, I would think. "new refers only to new" that stratifies objects into older and newer may make things simpler, but I am not convinced yet that it would give our users a smooth enough transition path (but I am open to be educated and pursuaded the other way). [Footnote] *1* In the above toy example, length being 40 vs 64 is used as a sign between SHA-1 and the new hash, and careful readers may wonder if we should use sha-3,20769079d22... or something like that that more explicity identifies what hash is used, so that we can pick a hash whose length is 64 when we transition again. I personally do not think such a prefix is necessary during the first transition; we will likely to adopt a new hash again, and at that point that third one can have a prefix to differenciate it from the second one.