On Tue, Feb 28, 2017 at 03:11:32PM -0800, Linus Torvalds wrote: > > Of course for dedicated code this can be simplified, and some parts > > could be further optimized. > > So I'd be worried about changing your tested code too much, since the > only test-cases we have are the two pdf files. If we screw up too > much, those will no longer show as collisions, but we could get tons > of false positives that we wouldn't see, so.. I can probably help with collecting data for that part on GitHub. I don't have an exact count of how many sha1 computations we do in a day, but it's...a lot. Obviously every pushed object gets its sha1 computed, but read operations also cover every commit and tree via parse_object() (though I think most of the blob reads do not). So it would be trivial to start by swapping out the "die()" on collision with something that writes to a log. This is the slow path that we don't expect to trigger at all, so log volume shouldn't be a problem. I've been waiting to see how speedups develop before deploying it in production. -Peff