Re: SHA1 collisions found

Lars Schneider <larsxschneider@xxxxxxxxx> · Sat, 25 Feb 2017 23:35:27 +0100

> On 25 Feb 2017, at 00:06, Jeff King <peff@xxxxxxxx> wrote:
> 
> On Fri, Feb 24, 2017 at 11:47:46PM +0100, Jakub Narębski wrote:
> 
>> I have just read on ArsTechnica[1] that while Git repository could be
>> corrupted (though this would require attackers to spend great amount
>> of resources creating their own collision, while as said elsewhere
>> in this thread allegedly easy to detect), putting two proof-of-concept
>> different PDFs with same size and SHA-1 actually *breaks* Subversion.
>> Repository can become corrupt, and stop accepting new commits.  
>> 
>> From what I understand people tried this, and Git doesn't exhibit
>> such problem.  I wonder what assumptions SVN made that were broken...
> 
> To be clear, nobody has generated a sha1 collision in Git yet, and you
> cannot blindly use the shattered PDFs to do so. Git's notion of the
> SHA-1 of an object include the header, so somebody would have to do a
> shattered-level collision search for something that starts with the
> correct "blob 1234\0" header.
> 
> So we don't actually know how Git would behave in the face of a SHA-1
> collision. It would be pretty easy to simulate it with something like:
> 
> ---
> diff --git a/block-sha1/sha1.c b/block-sha1/sha1.c
> index 22b125cf8..1be5b5ba3 100644
> --- a/block-sha1/sha1.c
> +++ b/block-sha1/sha1.c
> @@ -231,6 +231,16 @@ void blk_SHA1_Update(blk_SHA_CTX *ctx, const void *data, unsigned long len)
> 		memcpy(ctx->W, data, len);
> }
> 
> +/* sha1 of blobs containing "foo\n" and "bar\n" */
> +static const unsigned char foo_sha1[] = {
> +	0x25, 0x7c, 0xc5, 0x64, 0x2c, 0xb1, 0xa0, 0x54, 0xf0, 0x8c,
> +	0xc8, 0x3f, 0x2d, 0x94, 0x3e, 0x56, 0xfd, 0x3e, 0xbe, 0x99
> +};
> +static const unsigned char bar_sha1[] = {
> +	0x57, 0x16, 0xca, 0x59, 0x87, 0xcb, 0xf9, 0x7d, 0x6b, 0xb5,
> +	0x49, 0x20, 0xbe, 0xa6, 0xad, 0xde, 0x24, 0x2d, 0x87, 0xe6
> +};
> +
> void blk_SHA1_Final(unsigned char hashout[20], blk_SHA_CTX *ctx)
> {
> 	static const unsigned char pad[64] = { 0x80 };
> @@ -248,4 +258,8 @@ void blk_SHA1_Final(unsigned char hashout[20], blk_SHA_CTX *ctx)
> 	/* Output hash */
> 	for (i = 0; i < 5; i++)
> 		put_be32(hashout + i * 4, ctx->H[i]);
> +
> +	/* pretend "foo" and "bar" collide */
> +	if (!memcmp(hashout, bar_sha1, 20))
> +		memcpy(hashout, foo_sha1, 20);
> }

That's a good idea! I wonder if it would make sense to setup an 
additional job in TravisCI that patches every Git version with some hash 
collisions and then runs special tests. This way we could ensure Git 
behaves reasonable in case of a collision. E.g. by printing errors and 
not crashing or corrupting the repo. Do you think that would be worth 
the effort?

- Lars