> On 31 Mar 2017, at 19:45, Jeff King <peff@xxxxxxxx> wrote: > > On Fri, Mar 31, 2017 at 10:35:06AM -0700, Junio C Hamano wrote: > >> Lars Schneider <larsxschneider@xxxxxxxxx> writes: >> >>> Hi, >>> >>> I just got a report with the following output after a "git fetch" operation >>> using Git 2.11.0.windows.3 [1]: >>> >>> remote: Counting objects: 5922, done. >>> remote: Compressing objects: 100% (14/14), done. >>> error: inflate: data stream error (unknown compression method) >>> error: unable to unpack 6acd8f279a8b20311665f41134579b7380970446 header >>> fatal: SHA1 COLLISION FOUND WITH 6acd8f279a8b20311665f41134579b7380970446 ! >>> fatal: index-pack failed >>> >>> I would be really surprised if we discovered a SHA1 collision in a production >>> repo. My guess is that this is somehow triggered by a network issue (see data >>> stream error). Any tips how to debug this? >> >> Perhaps the first thing to do is to tweak the messages in builtin/index-pack.c >> to help you identify which one of identical 5 messages is firing. >> >> My guess would be that the code saw an object that came over the >> wire, hashed it to learn that its object name is >> 6acd8f279a8b20311665f41134579b7380970446, noticed that it locally >> already has the object with the same name, and tried to make sure >> they have identical contents (otherwise, what came over the wire is >> a successful second preimage attack). But your loose object on disk >> you already had was corrupt and didn't inflate correctly when >> builtin/index-pack.c::compare_objects() or check_collision() tried >> to. The code saw no data, or truncated data, or >> whatever---something different from what the other data that hashed >> down to 6acd8..., and reported a collision when there is no >> collision. > > My guess is the one in compare_objects(). The "unable to unpack" error > comes from sha1_loose_object_info(). We'd normally then follow up with > read_sha1_file(), which would generate its own set of errors. > > But if open_istream() got a bogus value for the object size (but didn't > correctly report an error), then it would probably quietly return 0 > bytes from read_istream() later. > > I suspect this may improve things, but I haven't dug deeper to see if > there are unwanted side effects, or if there are other spots that need > similar treatment. > > diff --git a/sha1_file.c b/sha1_file.c > index 43990dec7..38411f90b 100644 > --- a/sha1_file.c > +++ b/sha1_file.c > @@ -2952,7 +2952,7 @@ static int sha1_loose_object_info(const unsigned char *sha1, > if (status && oi->typep) > *oi->typep = status; > strbuf_release(&hdrbuf); > - return 0; > + return status; > } > > int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi, unsigned flags) Hi Peff, we are seeing this now in Git 2.14.1: ... error: inflate: data stream error (unknown compression method) error: unable to unpack 7b513f98a66ef9488e516e7abbc246438597c6d5 header error: inflate: data stream error (unknown compression method) error: unable to unpack 7b513f98a66ef9488e516e7abbc246438597c6d5 header fatal: loose object 7b513f98a66ef9488e516e7abbc246438597c6d5 (stored in .git/objects/7b/513f98a66ef9488e516e7abbc246438597c6d5) is corrupt fatal: The remote end hung up unexpectedly I guess this means your fix [1] works properly :-) At some point I will try to explore a retry mechanism for these cases. Cheers, Lars [1] https://github.com/git/git/commit/93cff9a978e1c177ac3e889867004a56773301b2