On Tue, Feb 28, 2017 at 11:07:37AM -0800, Junio C Hamano wrote: > Junio C Hamano <gitster@xxxxxxxxx> writes: > > >> [1/3]: add collision-detecting sha1 implementation > >> [2/3]: sha1dc: adjust header includes for git > >> [3/3]: Makefile: add USE_SHA1DC knob > > > > I was lazy so I fetched the above and then added this on top before > > I start to play with it. > > > > -- >8 -- > > From: Junio C Hamano <gitster@xxxxxxxxx> > > Date: Tue, 28 Feb 2017 10:39:25 -0800 > > Subject: [PATCH] sha1dc: resurrect LICENSE file > > In a way similar to 8415558f55 ("sha1dc: avoid c99 > declaration-after-statement", 2017-02-24), we would want this on > top. > > -- >8 -- > Subject: sha1dc: avoid 'for' loop initial decl Yeah, thanks, I had tweaked both that and the license thing locally but had not pushed it out yet. Both are obvious improvements. FWIW, I've been in touch with Dan Shumow, one of the authors, who has been looking at whether we can speed up the sha1dc implementation, or integrate the checks into the block-sha1 implementation. Here are a few notes on the earlier timings I posted that came out in our conversation: - the timings I showed earlier were actually openssl versus sha1dc. The block-sha1 timings fall somewhere in the middle: [running test-sha1 on a fresh linux.git packfile] 1.347s openssl 2.079s block-sha1 4.983s sha1dc [index-pack --verify on a fresh git.git packfile] 6.919s openssl 9.003s block-sha1 17.955s sha1dc Those are the operations that show off sha1 performance the most. The first one is really not even that interesting; it's raw sha1 performance. The second one is an actual operation users might notice (though not as --verify exactly, but as "index-pack --stdin" when receiving a fetch or a push). So there's room for improvement, but the gap between block-sha1 and sha1dc is not quite as big as I showed earlier. - Dan timed the sha1dc implementation with and without the collision detection enabled. The sha1 implementation is only 1.33x slower than block-sha1 (for raw sha1 time). Adding in the detection makes it 2.6x slower. So there's some potential gain from optimizing the sha1 implementation, but ultimately we may be looking at a 2x slowdown to add in the collision detection. It doesn't need to happen for _every_ sha1 we compute, but the index-pack case is the one that almost certainly _does_ want it, because that's when we're admitting remote objects into the repository (ditto you'd probably want it for write_sha1_file(), since you could be applying a patch from an untrusted source). -Peff