Re: SHA1 collision in production repo?! (probably not)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 31, 2017 at 10:35:06AM -0700, Junio C Hamano wrote:

> Lars Schneider <larsxschneider@xxxxxxxxx> writes:
> 
> > Hi,
> >
> > I just got a report with the following output after a "git fetch" operation
> > using Git 2.11.0.windows.3 [1]:
> >
> > remote: Counting objects: 5922, done.
> > remote: Compressing objects: 100% (14/14), done.
> > error: inflate: data stream error (unknown compression method)
> > error: unable to unpack 6acd8f279a8b20311665f41134579b7380970446 header
> > fatal: SHA1 COLLISION FOUND WITH 6acd8f279a8b20311665f41134579b7380970446 !
> > fatal: index-pack failed
> >
> > I would be really surprised if we discovered a SHA1 collision in a production
> > repo. My guess is that this is somehow triggered by a network issue (see data
> > stream error). Any tips how to debug this?
> 
> Perhaps the first thing to do is to tweak the messages in builtin/index-pack.c
> to help you identify which one of identical 5 messages is firing.
> 
> My guess would be that the code saw an object that came over the
> wire, hashed it to learn that its object name is
> 6acd8f279a8b20311665f41134579b7380970446, noticed that it locally
> already has the object with the same name, and tried to make sure
> they have identical contents (otherwise, what came over the wire is
> a successful second preimage attack).  But your loose object on disk
> you already had was corrupt and didn't inflate correctly when
> builtin/index-pack.c::compare_objects() or check_collision() tried
> to.  The code saw no data, or truncated data, or
> whatever---something different from what the other data that hashed
> down to 6acd8..., and reported a collision when there is no
> collision.

My guess is the one in compare_objects(). The "unable to unpack" error
comes from sha1_loose_object_info(). We'd normally then follow up with
read_sha1_file(), which would generate its own set of errors.

But if open_istream() got a bogus value for the object size (but didn't
correctly report an error), then it would probably quietly return 0
bytes from read_istream() later.

I suspect this may improve things, but I haven't dug deeper to see if
there are unwanted side effects, or if there are other spots that need
similar treatment.

diff --git a/sha1_file.c b/sha1_file.c
index 43990dec7..38411f90b 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2952,7 +2952,7 @@ static int sha1_loose_object_info(const unsigned char *sha1,
 	if (status && oi->typep)
 		*oi->typep = status;
 	strbuf_release(&hdrbuf);
-	return 0;
+	return status;
 }
 
 int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi, unsigned flags)

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux