Hi Folks, I have become interested in git's NewHash migration and have been investigating alternate hash algorithm support in git. It seems that implementation of NewHash is still a way off, and it is still possible to revise the choice of SHA256 for a better alternative. Many of you may already be aware of Hash content Length extension attacks [1]. While I am not aware of any git vulnerabilities based on length extension; it seems to me that choosing a hash algorithm that is not vulnerable to this attack could prevent a future exploit based on some future protocol or other bug. git hashes should be final in its usage, so length extension resistance is a good property. SHA3 is not vulnerable to content length extension because it uses a sponge construction internally, meaning state is diffused away, and the state cannot be reconstructed from a hash value. Truncated SHA2 hashes such as SHA224 and SHA512/256 are also not vulnerable to length extension, however, SHA224 only reduces the probability of extension by 1/2^32 which, while not trivial, is a small reduction. SHA512/256 or SHA3-256 seems to me to be most attractive for safety against content length extension. With this in mind, I decided to investigate performance properties of the alternative SHA hash algorithms. Of particular interest to me is SHA3-256 and SHA512/256 (SHA512 truncated to 256-bits) as they are both the same size as SHA256 thus are good NewHash contenders. I was quite surprised by the benchmark results with SHA512/256 1.5X faster than SHA256. SHA512/256 seems a very good alternative to SHA256: - SHA256 8 x 32-bit state (256-bits) and 64-byte block, 64 rounds - SHA512 8 x 64-bit state (512-bits) and 128-byte block, 80 rounds SHA512/256 has understandable good performance on 64-bits systems due to having only slightly more iterations (80 instead of 64) while processing 128 bytes per block instead of 64. In practice, this means SHA512/256 is about 1.5X faster than SHA256 on 64-bit systems. SHA3, while competitive, is still significantly slower than SHA512. The AVX512 version of SHA3 is actually faster than SHA256. SHA256 is not the best performing NewHash alternative by a long shot. SHA Benchmarks on Intel Core i7-7980XE @ 4.6GHz =============================================== (MiB/sec)\ ext 16 128 1024 4096 65536 1048576 ----- ----- ----- ----- ----- ----- ----- ----- sha1 avx2 37.5 239.6 817.0 1097.5 1231.2 1244.8 sha224 avx2 30.8 170.5 439.3 524.3 560.2 564.4 sha256 avx2 31.2 171.2 438.7 525.0 560.9 564.8 sha512 avx2 27.1 167.2 561.5 747.1 835.7 842.8 sha512-224 avx2 26.8 164.7 558.8 747.9 836.5 845.1 sha512-256 avx2 26.6 163.9 557.9 751.4 835.0 843.0 sha3-224 avx2 38.9 313.8 411.5 474.6 491.1 490.7 sha3-256 avx2 39.7 317.3 414.5 445.3 464.9 464.3 sha3-384 avx2 39.8 184.6 338.1 348.4 356.5 358.0 sha3-512 avx2 40.0 185.0 228.4 245.9 248.0 249.0 (MiB/sec)\ ext 16 128 1024 4096 65536 1048576 ----- ----- ----- ----- ----- ----- ----- ----- sha3-224 avx512 24.3 192.6 506.5 674.3 736.8 740.9 sha3-256 avx512 25.0 198.4 511.5 635.4 698.8 698.8 sha3-384 avx512 24.8 152.3 429.6 503.9 535.7 540.2 sha3-512 avx512 24.8 152.4 307.3 361.2 375.6 374.4 Recommendations =============== After performing multiple benchmarks and thinking about this for several weeks I have come to the conclusion that SHA512/256 is a very good alternative to SHA256, both from the perspective of its length extension resistance, and performance. It seems that NewHash implementation is some way off so it is not too late to change. - Consider SHA512/256 for NewHash SHA algorithm patches ===================== I will be sending a patch series which adds SHA Algorithms which are in the sha-algorithms branch of my git tree. - https://github.com/michaeljclark/git.git - https://github.com/michaeljclark/git/tree/sha-algorithms/sha This is a precursor to asking some questions about the proposed implementation of multiple or compat hashs (N-hash or 2-hash?). I am curious about the pack format and directory structure for loose objects. It seems there needs to be a single hash chosen for loose object filenames, and auxilliary hashes will need hash to hash map to go from a compat hash to new hash. I have been reading the code, and it seems like it is going to be a challenge to change to support more than one hash concurrently. My observation is the content addressable storage layer could do with being a little more abstract, as there seems to be a lot of code that seperately computes hashes by accessing the hash algorithms directly versus going through an abstract content-addressable storage layer, like the VFS in Linux. In any case, I hope you find this benchmark data useful. Regards, Michael [1] https://en.wikipedia.org/wiki/Length_extension_attack