Hi Johannes, Thanks for the response. Sorry for the delay. Had a large deadline for $dayjob. On Wed, Sep 27, 2017 at 12:11:14AM +0200, Johannes Schindelin wrote: > On Tue, 26 Sep 2017, Jason Cooper wrote: > > On Thu, Sep 14, 2017 at 08:45:35PM +0200, Johannes Schindelin wrote: > > > On Wed, 13 Sep 2017, Linus Torvalds wrote: > > > > On Wed, Sep 13, 2017 at 6:43 AM, demerphq <demerphq@xxxxxxxxx> wrote: > > > > > SHA3 however uses a completely different design where it mixes a 1088 > > > > > bit block into a 1600 bit state, for a leverage of 2:3, and the excess > > > > > is *preserved between each block*. > > > > > > > > Yes. And considering that the SHA1 attack was actually predicated on > > > > the fact that each block was independent (no extra state between), I > > > > do think SHA3 is a better model. > > > > > > > > So I'd rather see SHA3-256 than SHA256. > > > > Well, for what it's worth, we need to be aware that SHA3 is *different*. > > In crypto, "different" = "bugs haven't been found yet". :-P > > > > And SHA2 is *known*. So we have a pretty good handle on how it'll > > weaken over time. > > Here, you seem to agree with me. Yep. > > > SHA-256 got much more cryptanalysis than SHA3-256, and apart from the > > > length-extension problem that does not affect Git's usage, there are no > > > known weaknesses so far. > > > > While I think that statement is true on it's face (particularly when > > including post-competition analysis), I don't think it's sufficient > > justification to chose one over the other. > > And here you don't. > > I find that very confusing. What I'm saying is that there is more to selecting a hash function for git than just the cryptographic assessment. In fact I would argue that the primary cryptographic concern for git is "What is the likelihood that we'll wake up one day to full collisions with no warning?" To that, I'd argue that SHA-256's time in the field and SHA3-256's competition give them both passing marks in that regard. fwiw, I'd also put Blake and Skein in there as well. The chance that any of those will suffer sudden, catastrophic failure is minimal. IOW, we'll have warnings, and time to migrate to the next function. None of us can predict the future, but having a significant amount of vetting reduces the chances of catastrophic failure. > > > It would seem that the experts I talked to were much more concerned about > > > that amount of attention than the particulars of the algorithm. My > > > impression was that the new features of SHA3 were less studied than the > > > well-known features of SHA2, and that the new-ness of SHA3 is not > > > necessarily a good thing. > > > > The only thing I really object to here is the abstract "experts". We're > > talking about cryptography and integrity here. It's no longer > > sufficient to cite anonymous experts. Either they can put their > > thoughts, opinions and analysis on record here, or it shouldn't be > > considered. Sorry. > > Sorry, you are asking cryptography experts to spend their time on the Git > mailing list. I tried to get them to speak out on the Git mailing list. > They respectfully declined. Ok, fair enough. Just please understand that it's difficult to place much weight on statements that we can't discuss with the person who made them. > > However, whether we chose SHA2 or SHA3 doesn't matter. > > To you, it does not matter. Well, I'd say it does not matter for *most* users. > To me, it matters. To the several thousand developers working on Windows, > probably the largest Git repository in active use, it matters. It matters > because the speed difference that has little impact on you has a lot more > impact on us. Ahhh, so if I understand you correctly, you'd prefer SHA-256 over SHA3-256 because it's more performant for your usecase? Well, that's a completely different animal that cryptographic suitability. Have you been able to crunch numbers yet? Will you be able to share some empirical data? I'd love to see some comparisons between SHA1, SHA-256, SHA512-256, and SHA3-256 for different git operations under your work load. > > If SHA3 is chosen as the successor, it's going to get a *lot* more > > adoption, and thus, a lot more analysis. If cracks start to show, the > > hard work of making git flexible is already done. We can migrate to > > SHA4/5/whatever in an orderly fashion with far less effort than the > > transition away from SHA1. > > Sure. And if XYZ789 is chosen, it's going to get a *lot* more adoption, > too. > > We think. > > Let's be realistic. Git is pretty important to us, but it is not important > enough to sway, say, Intel into announcing hardware support for SHA3. > And if you try to force through *any* hash function only so that it gets > more adoption and hence more support, That's quite a jump from what I was saying. I would never advise using code in a production setting just to increase adoption. What I /was/ saying: Let's say you don't get what you want, and SHA3-256 is chosen. It's not the end of the world from a cryptographic PoV. The hard work of making the git (and libgit2) codebases hash-flexible is already done. So, if you're correct, and SHA3 was too immature, the increased visibility will help us discover that more quickly. And, the code will already be in a position to conduct an orderly migration. Will it still be costly? Yes. But I would argue that it's naive to think that we will be using git/sha3-256 or git/sha-256 10 to 15 years from now. It might be git, it might not. But there *will* be another migration of existing data (code, history, etc) from one object storage model to another. It might be git/SHA4-512, or hg/sha4-384. So, we aren't trying to find the perfect hash function so that we naively think we'll never have to change again. Rather, we're choosing the next hash function so that we can hold off another migration for as long as possible. After all, SHA4-512 doesn't exist yet. ;-) > in the short run you will make life > harder for developers on more obscure platforms, who may not easily get > high-quality, high-speed implementations of anything but the very > mainstream (which is, let's face it, MD5, SHA-1 and SHA-256). I know I > would have cursed you for such a decision back when I had to work on AIX > and IRIX. I think you're assuming that all developers on obscure platforms have a similar git usecase to your current one. I've not heard of that being the case. > > For my use cases, as a user of git, I have a plan to maintain provable > > integrity of existing objects stored in git under sha1 while migrating > > away from sha1. The same plan works for migrating away from SHA2 or > > SHA3 when the time comes. > > Please do not make the mistake of taking your use case to be a template > for everybody's use case. I wasn't. But I will argue that my usecase is valid. Just as yours is. > Migrating a large team away from any hash function to another one *will* > be painful, and costly. Assuming that it will never happen again would make that doubly costly. > Migrating will be very costly for hosting companies like GitHub, Microsoft > and BitBucket, too. <with_my_business_hat_on> GitHub and BitBucket have git as the core of their business model. If they aren't keeping an eye on the future path of git and maintaining migration plans, shame on them. </with_my_business_hat_on> Thanks, Jason.