Re: [PATCH v4 3/8] finalize_object_file(): implement collision check

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 24, 2024 at 04:37:18PM -0400, Jeff King wrote:
> But I think we can expand the argument a bit. I don't think this is a
> very good place for such a collision check, because race conditions
> aside, we'll already have bailed long before! When we do a write via
> write_object_file(), for example, we'll hit the freshen paths that
> assume the contents are identical, and skip the write (and thus this
> finalize) entirely.
>
> So if you want a collision check, we have to do it at a much higher
> level. And indeed we do: index-pack already implements such a check
> (through the confusingly named "check_collison" function). I don't think
> unpack-objects does so (it looks like it just feeds the result to
> write_object_file(), which does the freshening thing).
>
> So the argument I'd make here is more like: this is the wrong place to
> do it.

I think that is reasonable, and I agree with your reasoning here. I'm
happy to reword the commit message if you think doing so would be
useful, or drop the paragraph entirely if you think it's just confusing.

Let me know what you think, I'm happy to do whatever here, reroll or not
:-).

>   Side thoughts on collision checks:
>
>     I think index-pack is safe in the sense that it will always prefer
>     on-disk copies and will complain if it sees a collision.
>     unpack-objects is not, nor are regular in-repo writes (which
>     normally cannot be triggered by remote, but on forges that do
>     merges, etc, that's not always true). Both of the latter are also
>     subject to races, where a simultaneous collision might let the
>     attacker win.

Yup.

>     That race is kind of moot in a world where anybody can push to a
>     fork of a repo that ends up in the same shared location (so they can
>     actually win and become the "on disk" copy), and we're relying on
>     sha1dc for protection there. That's specific to certain forges, but
>     they do represent a lot of Git use.
>
>     In general, the use of unpack-objects versus index-pack is up to the
>     attacker (based on pack size). So I think it would be nice if
>     unpack-objects did the collision check. Even if the attacker beats
>     you to writing the object, it would be nice to see "holy crap, there
>     was a collision" instead of just silently throwing your pushed
>     object away.

Right, I agree that unpack-objects definitely should do the collision
check here. And indeed, that is the case, since that code (which all is
directly implemented in the builtin) uses the regular
collision-detecting SHA-1 implementation.

>     I know that GitHub only ever runs index-pack, which may give some
>     amount of protection here. In general, I think we should consider
>     deprecating unpack-objects in favor of teaching index-pack to
>     unpack. It has many enhancements (this one, but also threading) that
>     unpack-objects does not. I have an old patch series for this (sorry,
>     only from 2017, I'm slipping). IIRC the sticking point was that
>     unpack-objects is better about memory use in some cases (streaming
>     large blobs?) and I hadn't figured out a way around that.

Only seven years old? Sheesh, you really are slipping ;-).

> Phew. Sorry, that was a long digression for "I think you're right, but I
> might have argued it a little differently". I think the direction of the
> patch (skipping checks entirely for loose objects) is the right thing.
>
> > As a small note related to the latter bullet point above, we must teach
> > the tmp-objdir routines to similarly skip the content-level collision
> > checks when calling migrate_one() on a loose object file, which we do by
> > setting the FOF_SKIP_COLLISION_CHECK bit when we are inside of a loose
> > object shard.
>
> OK, this is the part I was wondering how you were going to deal with. :)
> Let's look at the implementation...
>
> > @@ -239,11 +247,15 @@ static int migrate_paths(struct strbuf *src, struct strbuf *dst)
> >
> >  	for (i = 0; i < paths.nr; i++) {
> >  		const char *name = paths.items[i].string;
> > +		enum finalize_object_file_flags flags_copy = flags;
> >
> >  		strbuf_addf(src, "/%s", name);
> >  		strbuf_addf(dst, "/%s", name);
> >
> > -		ret |= migrate_one(src, dst);
> > +		if (is_loose_object_shard(name))
> > +			flags_copy |= FOF_SKIP_COLLISION_CHECK;
> > +
> > +		ret |= migrate_one(src, dst, flags_copy);
> >
> >  		strbuf_setlen(src, src_len);
> >  		strbuf_setlen(dst, dst_len);
>
> This looks pretty reasonable overall, though I'd note that
> migrate_paths() is called recursively. So if I had:
>
>   tmp-objdir-XXXXXX/pack/ab/foo.pack
>
> we'd skip the collision check. I'm not sure how much we want to worry
> about that. I don't think we'd ever create such a path, and the general
> form of the paths is all under local control, so an attacker can't
> convince us to do so. And I find it pretty unlikely that we'd change the
> on-disk layout to accidentally trigger this.

I thought about this when writing it, and came to the conclusion that
the checks we have for "are we in something that looks like a loose
object shard?" are sane. That's only because we won't bother reading a
pack in $GIT_DIR/objects/pack/xx/yy.pack, since we do not read packs
recursively from the pack sub-directory.

So I think that the diff you posted below isn't hurting anything, and
the implementation looks correct to me, but I also don't think it's
helping anything either as a consequence of the above.

Thanks,
Taylor




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux