On Fri, Jun 24 2022, Jeff King wrote: > On Wed, Jun 22, 2022 at 12:29:59AM +0000, brian m. carlson wrote: > >> > We've since migrated our default hash function from SHA-1 to SHA-1DC >> > (except on vanilla OSX, see [2]). It's a variant SHA-1 that detects the >> > SHAttered attack implemented by the same researchers. I'm not aware of a >> > current viable SHA-1 collision against the variant of SHA-1 that we >> > actually use these days. >> >> That's true, but that still doesn't let you store the data. There is >> some data that you can't store in a SHA-1 repository, and SHA-1DC is >> extremely slow. Using SHA-256 can make things like indexing packs >> substantially faster. > > I'm curious if you have numbers on this. I naively converted linux.git > to sha256 by doing "fast-export | fast-import" (the latter in a sha256 > repo, of course, and then both repacked with "-f --window=250" to get > reasonable apples-to-apples packs). > > Running "index-pack --verify" on the result takes about the same time > (this is on an 8-core system, hence the real/user differences): > > [sha1dc] > real 2m43.754s > user 10m52.452s > sys 0m36.745s > > [sha256] > real 2m41.884s > user 12m23.344s > sys 0m35.222s > > The sha256 repo actually has about 10% fewer objects (I didn't > investigate, but this is perhaps due to cutting out tags and a few other > things to convince fast-export to finish running). I'm not sure about > the extra user time (multicore timings here are funny because of > frequency scaling, so I think the "real" line is more interesting). So > sha256 actually comes out a bit worse here. On the other hand, this is > just using our blk_SHA256 implementation. There may be faster > alternatives (including ones with hardware support). > > I wouldn't be at all surprised if the difference isn't substantial in > the long run, though. The repo is on the order of 100GB of object data. > That's a lot to hash, but it's also just a lot to deal with at all (zlib > inflating, applying deltas, etc). > > Anyway, this is a pretty rough cut at an experiment. I was mostly > curious if you had done something more advanced, and/or gotten different > results. I haven't checked or verified this, but https://www.marc-stevens.nl/research/#software claims: Counter-cryptanalysis: New improved release SHA-1 collision detection library, which protects against twice as many SHA-1 attack classes (disturbance vectors), but is 9 times faster than previous version. Speed is now 1.87 times normal SHA-1. It is currently used among others by Git, GitHub, GMail, Google Drive and Microsoft OneDrive. And looking at the OID you initially imported for sha1dc (and my later submodule import) we've always had what seems to have been that performance improvement, which I think (but I didn't have time to benchmark) is: https://github.com/cr-marcstevens/sha1collisiondetection/pull/20 *But* there was also this later performance work: https://github.com/cr-marcstevens/sha1collisiondetection/pull/30; see also this comment: https://github.com/cr-marcstevens/sha1collisiondetection/commit/33a694a9ee1b79c24be45f9eab5ac0e1aeeaf271 And then if you look at the sha1collisiondetection repo the latest tag is stable-v1.0.3, which pre-dates that (but not the original perf work), and was tagged in 2017. There were a lot of commits since then. I wasn't able to find any third party package using DC_SHA1_EXTERNAL, but I wonder if any performance tests with sha1dc in the wild are using some older version, which from the looks of it might have had a performance regression on x86...