Re: SHA1 collision in production repo?! (probably not)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On 31 Mar 2017, at 19:45, Jeff King <peff@xxxxxxxx> wrote:
> 
> On Fri, Mar 31, 2017 at 10:35:06AM -0700, Junio C Hamano wrote:
> 
>> Lars Schneider <larsxschneider@xxxxxxxxx> writes:
>> 
>>> Hi,
>>> 
>>> I just got a report with the following output after a "git fetch" operation
>>> using Git 2.11.0.windows.3 [1]:
>>> 
>>> remote: Counting objects: 5922, done.
>>> remote: Compressing objects: 100% (14/14), done.
>>> error: inflate: data stream error (unknown compression method)
>>> error: unable to unpack 6acd8f279a8b20311665f41134579b7380970446 header
>>> fatal: SHA1 COLLISION FOUND WITH 6acd8f279a8b20311665f41134579b7380970446 !
>>> fatal: index-pack failed
>>> 
>>> I would be really surprised if we discovered a SHA1 collision in a production
>>> repo. My guess is that this is somehow triggered by a network issue (see data
>>> stream error). Any tips how to debug this?
>> 
>> Perhaps the first thing to do is to tweak the messages in builtin/index-pack.c
>> to help you identify which one of identical 5 messages is firing.
>> 
>> My guess would be that the code saw an object that came over the
>> wire, hashed it to learn that its object name is
>> 6acd8f279a8b20311665f41134579b7380970446, noticed that it locally
>> already has the object with the same name, and tried to make sure
>> they have identical contents (otherwise, what came over the wire is
>> a successful second preimage attack).  But your loose object on disk
>> you already had was corrupt and didn't inflate correctly when
>> builtin/index-pack.c::compare_objects() or check_collision() tried
>> to.  The code saw no data, or truncated data, or
>> whatever---something different from what the other data that hashed
>> down to 6acd8..., and reported a collision when there is no
>> collision.
> 
> My guess is the one in compare_objects(). The "unable to unpack" error
> comes from sha1_loose_object_info(). We'd normally then follow up with
> read_sha1_file(), which would generate its own set of errors.
> 
> But if open_istream() got a bogus value for the object size (but didn't
> correctly report an error), then it would probably quietly return 0
> bytes from read_istream() later.
> 
> I suspect this may improve things, but I haven't dug deeper to see if
> there are unwanted side effects, or if there are other spots that need
> similar treatment.
> 
> diff --git a/sha1_file.c b/sha1_file.c
> index 43990dec7..38411f90b 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -2952,7 +2952,7 @@ static int sha1_loose_object_info(const unsigned char *sha1,
> 	if (status && oi->typep)
> 		*oi->typep = status;
> 	strbuf_release(&hdrbuf);
> -	return 0;
> +	return status;
> }
> 
> int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi, unsigned flags)

Hi Peff,

we are seeing this now in Git 2.14.1:

...
error: inflate: data stream error (unknown compression method)
error: unable to unpack 7b513f98a66ef9488e516e7abbc246438597c6d5 header
error: inflate: data stream error (unknown compression method)
error: unable to unpack 7b513f98a66ef9488e516e7abbc246438597c6d5 header
fatal: loose object 7b513f98a66ef9488e516e7abbc246438597c6d5 (stored in .git/objects/7b/513f98a66ef9488e516e7abbc246438597c6d5) is corrupt
fatal: The remote end hung up unexpectedly

I guess this means your fix [1] works properly :-)

At some point I will try to explore a retry mechanism for these cases.

Cheers,
Lars

[1] https://github.com/git/git/commit/93cff9a978e1c177ac3e889867004a56773301b2



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux