Junio C Hamano <gitster@xxxxxxxxx> writes: > One thing I still do not know how I feel about after re-reading the > thread, and I didn't find the above doc, is Linus's suggestion to > use the objects themselves as NewHash-to-SHA-1 mapper [*1*]. > ... > [Reference] > > *1* <CA+55aFxj7Vtwac64RfAz_u=U4tob4Xg+2pDBDFNpJdmgaTCmxA@xxxxxxxxxxxxxx> I think this falls into the same category as the often-talked-about addition of the "generation number" field. It is very tempting to add these "mechanically derivable but expensive to compute" pieces of information to the sha3-content while converting from sha1-content and creating anew. Because the "sha1-name" or the "generation number" can mechanically be computed, as long as everybody agrees to _always_ place them in the sha3-content, the same sha1-content will be converted into exactly the same sha3-content without ambiguity, and converting them back to sha1-content while pushing to an older repository will correctly produce the original sha1-content, as it would just be the matter of simply stripping these extra pieces of information. The reason why I still feel a bit uneasy about adding these things (aside from the fact that sha1-name thing will be a baggage we would need to carry forever even after we completely wean ourselves off of the old hash) is because I am not sure what we should do when we encounter sha3-content in the wild that has these things _wrong_. An object that exists today in the SHA-1 world is fetched into the new repository and converted to SHA-3 contents, and Linus's extra "original SHA-1 name" field is added to the object's header while recording the SHA-3 content. But for whatever reason, the original SHA-1 name is recorded incorrectly in the resulting SHA-3 object. The same thing could happen if we decide to bake "generation number" in the SHA-3 commit objects. One possible definition would be that a root commit will have gen #0; a commit with 1 or more parents will get max(parents' gen numbers) + 1 as its gen number. But somebody may botch the counting and records sum(parents' gen numbers) as its gen number. In these cases, not just the SHA3-content but also the resulting SHA-3 object name would be different from the name of the object that would have recorded the same contents correctly. So converting back to SHA-1 world from these botched SHA-3 contents may produce the original contents, but we may end up with multiple "plausibly looking" set of SHA-3 objects that (clain to) correspond to a single SHA-1 object, only one of which is a valid one. Our "git fsck" already treats certain brokenness (like a tree whose entry has mode that is 0-padded to the left) as broken but still tolerate them. I am not sure if it is sufficient to diagnose and declare broken and invalid when we see sha3-content that records these "mechanically derivable but expensive to compute" pieces of information incorrectly. I am leaning towards saying "yes, catching in fsck is enough" and suggesting to add generation number to sha3-content of the commit objects, and to add even the "original sha1 name" thing if we find good use of it. But I cannot shake this nagging feeling off that I am missing some huge problems that adding these fields and opening ourselves to more classes of broken objects. Thoughts?