On Thu, Feb 23, 2017 at 02:38:29PM -0800, Linus Torvalds wrote: > > Thanks, I hadn't seen that yet. That doesn't look like it should be hard > > to integrate into Git. > > Here's a *very* ugly patch that is absolutely disgusting and should not be > used. But it does kind of work (I tested it with a faked-up extra patch > that made git accept the broken pdf as a loose object). > > What do I mean by "kind of work"? It uses that ugly and slow checking > SHA1 routine from the collision detection project for the SHA1 object > verification, and it means that "git fsck" ends up being about twice as > slow as it used to be. Heh. I was just putting the finishing touches on a similar patch. Mine is much less gross, in that it actually just adds a new USE_SHA1DC knob (instead of, say, BLK_SHA1). Here are the timings I came up with: - compute sha1 over whole packfile before: 1.349s after: 5.067s change: +275% - rev-list --all before: 5.742s after: 5.730s change: -0.2% - rev-list --all --objects before: 33.257s after: 33.392s change: +0.4% - index-pack --verify before: 2m20s after: 5m43s change: +145% - git log --no-merges -10000 -p before: 9.532s after: 9.683s change: +1.5% So overall the sha1 computation is about 3-4x slower. But of course most operations do more than just sha1. Accessing commits and trees isn't slowed at all (both the +/- changes there are well within the run-to-run noise). Accessing the blobs is a little slower, but mostly drowned out by the cost of things like actually generating patches. The most-affected operation is `index-pack --verify`, which is essentially just computing the sha1 on every object. It's a bit worse than twice as slow, which means every push and every fetch is going to experience that. > For example, I suspect we could use our (much cleaner) block-sha1 > implementation and include just the ubc_check.c code with that, instead of > the truly ugly C sha1 implementation that the sha1collisiondetection > project uses. > > But to do that, somebody would have to really know how the unavoidable > bit conditions check works with the intermediate hashes. I have only a > "big picture" mental model of it (read: I'm not competent to do that). Yeah. I started looking at that, but the ubc check happens after the initial expansion. But AFAICT, block-sha1 mixes that expansion in with the rest of the steps for efficiency. So perhaps somebody who really understands sha1 and the new checks could figure it out, but I'm not at all certain that adding it in wouldn't lose some of block-sha1's efficiency (on top of the time to actually do the ubc check). -Peff