Re: [PATCH v4 3/8] finalize_object_file(): implement collision check

Taylor Blau <me@xxxxxxxxxxxx> · Tue, 24 Sep 2024 18:02:30 -0400

On Tue, Sep 24, 2024 at 02:32:21PM -0700, Junio C Hamano wrote:
> >   - We already use the path of the loose object as its hash value /
> >     object name, so checking for collisions at the content level doesn't
> >     add anything.
>
> "We already ... doesn't add anything" -> "The point of collision
> check is to find an attempt to store different contents that hashes
> down to the same object name, and we continue to use sha1dc to hash
> the content name so such a collision would have already been
> detected".

Exactly.

> >     This is why we do not bother to check the inflated object contents
> >     for collisions either, since either (a) the object contents have the
> >     fingerprint of a SHA-1 collision, in which case the collision
> >     detecting SHA-1 implementation used to hash the contents to give us
> >     a path would have already rejected it, or (b) the contents are part
> >     of a colliding pair which does not bear the same fingerprints of
> >     known collision attacks, in which case we would not have caught it
> >     anyway.
>
> I do not understand (b).  If you compare inflated contents, you
> would find such a pair that sha1dc missed but the "implement
> collision check" patch would have caught as a pair of byte sequences
> that hash to the same value.  Isn't it an opportunity to make it safer
> to implement such a loose object collision check?

All I was saying with (b) was that if you had some new collision attack
that didn't trigger the existing detection mechanisms in the existing
collision-detecting SHA-1 implementation, then you wouldn't notice the
collision anyway.

You could inflate the contents and compare them, but I don't think it
would much matter at that point since you've already let one of them
enter the repository, and it's not clear which is the "correct" one to
keep since they both hash to the same value.

Thanks,
Taylor