Re: SHA1 collisions found

Jeff King <peff@xxxxxxxx> · Wed, 1 Mar 2017 14:05:24 -0500

On Tue, Feb 28, 2017 at 03:11:32PM -0800, Linus Torvalds wrote:

> > Of course for dedicated code this can be simplified, and some parts
> > could be further optimized.
> 
> So I'd be worried about changing your tested code too much, since the
> only test-cases we have are the two pdf files. If we screw up too
> much, those will no longer show as collisions, but we could get tons
> of false positives that we wouldn't see, so..

I can probably help with collecting data for that part on GitHub.

I don't have an exact count of how many sha1 computations we do in a
day, but it's...a lot. Obviously every pushed object gets its sha1
computed, but read operations also cover every commit and tree via
parse_object() (though I think most of the blob reads do not).

So it would be trivial to start by swapping out the "die()" on collision
with something that writes to a log. This is the slow path that we don't
expect to trigger at all, so log volume shouldn't be a problem.

I've been waiting to see how speedups develop before deploying it in
production.

-Peff