Re: SHA-256 transition

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 24 2022, Jeff King wrote:

> On Wed, Jun 22, 2022 at 12:29:59AM +0000, brian m. carlson wrote:
>
>> > We've since migrated our default hash function from SHA-1 to SHA-1DC
>> > (except on vanilla OSX, see [2]). It's a variant SHA-1 that detects the
>> > SHAttered attack implemented by the same researchers. I'm not aware of a
>> > current viable SHA-1 collision against the variant of SHA-1 that we
>> > actually use these days.
>> 
>> That's true, but that still doesn't let you store the data.  There is
>> some data that you can't store in a SHA-1 repository, and SHA-1DC is
>> extremely slow.  Using SHA-256 can make things like indexing packs
>> substantially faster.
>
> I'm curious if you have numbers on this. I naively converted linux.git
> to sha256 by doing "fast-export | fast-import" (the latter in a sha256
> repo, of course, and then both repacked with "-f --window=250" to get
> reasonable apples-to-apples packs).
>
> Running "index-pack --verify" on the result takes about the same time
> (this is on an 8-core system, hence the real/user differences):
>
>   [sha1dc]
>   real	2m43.754s
>   user	10m52.452s
>   sys	0m36.745s
>
>   [sha256]
>   real	2m41.884s
>   user	12m23.344s
>   sys	0m35.222s
>
> The sha256 repo actually has about 10% fewer objects (I didn't
> investigate, but this is perhaps due to cutting out tags and a few other
> things to convince fast-export to finish running). I'm not sure about
> the extra user time (multicore timings here are funny because of
> frequency scaling, so I think the "real" line is more interesting). So
> sha256 actually comes out a bit worse here. On the other hand, this is
> just using our blk_SHA256 implementation. There may be faster
> alternatives (including ones with hardware support).
>
> I wouldn't be at all surprised if the difference isn't substantial in
> the long run, though. The repo is on the order of 100GB of object data.
> That's a lot to hash, but it's also just a lot to deal with at all (zlib
> inflating, applying deltas, etc).
>
> Anyway, this is a pretty rough cut at an experiment. I was mostly
> curious if you had done something more advanced, and/or gotten different
> results.

I haven't checked or verified this, but
https://www.marc-stevens.nl/research/#software claims:

    Counter-cryptanalysis: New improved release SHA-1 collision
    detection library, which protects against twice as many SHA-1 attack
    classes (disturbance vectors), but is 9 times faster than previous
    version. Speed is now 1.87 times normal SHA-1. It is currently used
    among others by Git, GitHub, GMail, Google Drive and Microsoft
    OneDrive.

And looking at the OID you initially imported for sha1dc (and my later
submodule import) we've always had what seems to have been that
performance improvement, which I think (but I didn't have time to
benchmark) is:
https://github.com/cr-marcstevens/sha1collisiondetection/pull/20

*But* there was also this later performance work:
https://github.com/cr-marcstevens/sha1collisiondetection/pull/30; see
also this comment:
https://github.com/cr-marcstevens/sha1collisiondetection/commit/33a694a9ee1b79c24be45f9eab5ac0e1aeeaf271

And then if you look at the sha1collisiondetection repo the latest tag
is stable-v1.0.3, which pre-dates that (but not the original perf work),
and was tagged in 2017. There were a lot of commits since then.

I wasn't able to find any third party package using DC_SHA1_EXTERNAL,
but I wonder if any performance tests with sha1dc in the wild are using
some older version, which from the looks of it might have had a
performance regression on x86...





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux