On Tue, Oct 16, 2012 at 12:15:21PM +0700, Nguyen Thai Ngoc Duy wrote: > On Tue, Oct 16, 2012 at 11:51 AM, Jeff King <peff@xxxxxxxx> wrote: > >> Its worth nothing that a SHA-1 collision can be identified at the > >> server because the server performs a byte-for-byte compare of both > >> copies of the object to make sure they match exactly in every way. Its > >> not fast, but its safe. :-) > > > > Do we? I thought early versions of git did that, but we did not > > double-check collisions any more for performance reasons. You don't > > happen to remember where that code is, do you (not that it really > > matters, but I am just curious)? > > We do. I touched that sha-1 collision code last time I updated > index-pack, to support large blobs. We only do that when we receive an > object that we already have, which should not happen often unless > you're under attack, so little performance impact normally. Search > "collision" in index-pack.c Ah, thanks, I remember this now. I think that I was thinking of the very early code to check every sha1 file write. E.g., the code killed off by aac1794 (Improve sha1 object file writing., 2005-05-03). But that is ancient history that is not really relevant. Interesting that we check only in index-pack. If the pushed content is small enough, we will call unpack-objects. That follows the usual code path for writing the object, which will prefer the existing copy. I suspect a site that is heavy on alternates is invoking the index-pack code path more frequently than necessary (e.g., history gets pushed to one forked repo, then when it goes to the next one, we may not share the ref that tells the client we already have the object and receive it a second time). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html