On 06/15, Johannes Schindelin wrote: > Hi, > > I thought it better to revive this old thread rather than start a new > thread, so as to automatically reach everybody who chimed in originally. > > On Mon, 6 Mar 2017, Brandon Williams wrote: > > > On 03/06, brian m. carlson wrote: > > > > > On Sat, Mar 04, 2017 at 06:35:38PM -0800, Linus Torvalds wrote: > > > > > > > Btw, I do think the particular choice of hash should still be on the > > > > table. sha-256 may be the obvious first choice, but there are > > > > definitely a few reasons to consider alternatives, especially if > > > > it's a complete switch-over like this. > > > > > > > > One is large-file behavior - a parallel (or tree) mode could improve > > > > on that noticeably. BLAKE2 does have special support for that, for > > > > example. And SHA-256 does have known attacks compared to SHA-3-256 > > > > or BLAKE2 - whether that is due to age or due to more effort, I > > > > can't really judge. But if we're switching away from SHA1 due to > > > > known attacks, it does feel like we should be careful. > > > > > > I agree with Linus on this. SHA-256 is the slowest option, and it's > > > the one with the most advanced cryptanalysis. SHA-3-256 is faster on > > > 64-bit machines (which, as we've seen on the list, is the overwhelming > > > majority of machines using Git), and even BLAKE2b-256 is stronger. > > > > > > Doing this all over again in another couple years should also be a > > > non-goal. > > > > I agree that when we decide to move to a new algorithm that we should > > select one which we plan on using for as long as possible (much longer > > than a couple years). While writing the document we simply used > > "sha256" because it was more tangible and easier to reference. > > The SHA-1 transition *requires* a knob telling Git that the current > repository uses a hash function different from SHA-1. > > It would make *a whole of a lot of sense* to make that knob *not* Boolean, > but to specify *which* hash function is in use. 100% agree on this point. I believe the current plan is to have the hashing function used for a repository be a repository format extension which would be a value (most likely a string like 'sha1', 'sha256', 'black2', etc) stored in a repository's .git/config. This way, upon startup git will die or ignore a repository which uses a hashing function which it does not recognize or does not compiled to handle. I hope (and expect) that the end produce of this transition is a nice, clean hashing API and interface with sufficient abstractions such that if I wanted to switch to a different hashing function I would just need to implement the interface with the new hashing function and ensure that 'verify_repository_format' allows the new function. > > That way, it will be easier to switch another time when it becomes > necessary. > > And it will also make it easier for interested parties to use a different > hash function in their infrastructure if they want. > > And it lifts part of that burden that we have to consider *very carefully* > which function to pick. We still should be more careful than in 2005, when > Git was born, and when, incidentally, when the first attacks on SHA-1 > became known, of course. We were just lucky for almost 12 years. > > Now, with Dunning-Kruger in mind, I feel that my degree in mathematics > equips me with *just enough* competence to know just how little *even I* > know about cryptography. > > The smart thing to do, hence, was to get involved in this discussion and > act as Lt Tawney Madison between us Git developers and experts in > cryptography. > > It just so happens that I work at a company with access to excellent > cryptographers, and as we own the largest Git repository on the planet, we > have a vested interest in ensuring Git's continued success. > > After a couple of conversations with a couple of experts who I cannot > thank enough for their time and patience, let alone their knowledge about > this matter, it would appear that we may not have had a complete enough > picture yet to even start to make the decision on the hash function to > use. > -- Brandon Williams