Re: [PATCH v4 3/8] finalize_object_file(): implement collision check

Jeff King <peff@xxxxxxxx> · Tue, 24 Sep 2024 18:20:39 -0400

On Tue, Sep 24, 2024 at 05:59:10PM -0400, Taylor Blau wrote:

> > So the argument I'd make here is more like: this is the wrong place to
> > do it.
> 
> I think that is reasonable, and I agree with your reasoning here. I'm
> happy to reword the commit message if you think doing so would be
> useful, or drop the paragraph entirely if you think it's just confusing.
> 
> Let me know what you think, I'm happy to do whatever here, reroll or not
> :-).

I'm content to let this live in the list archive, but it sounds like
Junio had the same reaction, so it may be worth trying to rework the
commit message a bit.

> >     In general, the use of unpack-objects versus index-pack is up to the
> >     attacker (based on pack size). So I think it would be nice if
> >     unpack-objects did the collision check. Even if the attacker beats
> >     you to writing the object, it would be nice to see "holy crap, there
> >     was a collision" instead of just silently throwing your pushed
> >     object away.
> 
> Right, I agree that unpack-objects definitely should do the collision
> check here. And indeed, that is the case, since that code (which all is
> directly implemented in the builtin) uses the regular
> collision-detecting SHA-1 implementation.

It uses sha1dc, but what I mean by "collision check" is an additional
belt-and-suspenders check. That even once we see an object which
hashes to a particular sha1 which we already have, we'll do the
byte-for-byte comparison. See index-pack's "check_collison".

And that's what I think should be added to unpack-objects (but again,
this is all orthogonal to your series).

> I thought about this when writing it, and came to the conclusion that
> the checks we have for "are we in something that looks like a loose
> object shard?" are sane. That's only because we won't bother reading a
> pack in $GIT_DIR/objects/pack/xx/yy.pack, since we do not read packs
> recursively from the pack sub-directory.
> 
> So I think that the diff you posted below isn't hurting anything, and
> the implementation looks correct to me, but I also don't think it's
> helping anything either as a consequence of the above.

Right, it's purely about future-proofing against a hypothetical change
where we'd be more inclusive (both in what we read, but also in what
we'd ourselves write!).

-Peff