Junio C Hamano wrote: > Junio C Hamano <gitster@xxxxxxxxx> writes: >> Adam Langley <agl@xxxxxxxxxx> writes: >>> However, as I'm not a git developer, I've no opinion on whether the >>> cost of carrying implementations of these functions is worth the speed >>> vs using SHA-256, which can be assumed to be supported everywhere >>> already. >> >> Thanks. >> >> My impression from this thread is that even though fast may be >> better than slow, ubiquity trumps it for our use case, as long as >> the thing is not absurdly and unusably slow, of course. Which makes >> me lean towards something older/more established like SHA-256, and >> it would be a very nice bonus if it gets hardware acceleration more >> widely than others ;-) > > Ah, I recall one thing that was mentioned but not discussed much in > the thread: possible use of tree-hashing to exploit multiple cores > hashing a large-ish payload. As long as it is OK to pick a sound > tree hash coding on top of any (secure) underlying hash function, > I do not think the use of tree-hashing should not affect which exact > underlying hash function is to be used, and I also am not convinced > if we really want tree hashing (some codepaths that deal with a large > payload wants to stream the data in single pass from head to tail) > in the context of Git, but I am not a crypto person, so ... Tree hashing also affects single-core performance because of the availability of SIMD instructions. That is how software implementations of e.g. blake2bp-256 and SHA-256x16[1] are able to have competitive performance with (slightly better performance than, at least in some cases) hardware implementations of SHA-256. It is also satisfying that we have options like these that are faster than SHA-1. All that said, SHA-256 seems like a fine choice, despite its worse performance. The wide availability of reasonable-quality implementations (e.g. in Java you can use 'MessageDigest.getInstance("SHA-256")') makes it a very tempting one. Part of the reason I suggested previously that it would be helpful to try to benchmark Git with various hash functions (which didn't go over well, for some reason) is that it makes these comparisons more concrete. Without measuring, it is hard to get a sense of the distribution of input sizes and how much practical effect the differences we are talking about have. Thanks, Jonathan [1] https://eprint.iacr.org/2012/476.pdf