Re: [PATCH v3 08/16] midx: allow marking a pack as preferred

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 29, 2021 at 08:00:59AM -0400, Jeff King wrote:
> I think in the long run we may want to add a midx chunk that gives the
> order of the packs (and likewise allow the caller of "midx write" to
> specify the exact order), since that may allow correlating locality
> between history and object order within the .rev/.bitmap files.
>
> But I think this is a nice stopping point for this series, since we're
> not having to introduce any new on-disk formats to do it, and it seems
> to give pretty good results in practice. I guess we'll have to support
> --preferred-pack forever, but that's OK. Even if we do eventually
> support arbitrary orderings, it's just a simple subset of that
> functionality.

To add a little bit of extra detail, I think what you're getting at here
is that it would be nice to let the order of the packs be dictated by
mtime, not the order they appear in the MIDX (which is lexicographic by
their hash, and thus effectively random).

The reason there being the same as you pointed out in

    https://lore.kernel.org/git/YDRdmh8oS5%2Fxq4rB@xxxxxxxxxxxxxxxxxxxxxxx/

which is that it effectively would lay objects out from newest to
oldest.

But, there's a problem, which is that the MIDX doesn't store the packs'
mtimes. That's fine for writing, since we can just look that information
up ourselves. But the reading side can get broken. That's because the
reader also has to know the pack order to go from MIDX- to bit-position.

So if a third party goes and touches some of the packs after the .rev
file was written, then the reader is going to think the packs ought to
appear in a different order than they actually do. So relying on having
to look up the mtimes again later on isn't good enough.

There are two solutions to the problem:

  - You could write the mtimes in the MIDX itself. This would give you a
    single point of reference, and resolve the TOCTOU race I just
    described.

  - Or, you could forget about mtimes entirely and let the MIDX dictate
    the pack ordering itself. That resolves the race in a
    similar-but-different way.

Of the two, I prefer the latter, but I think it introduces functionality
that we don't necessarily need yet. That's because the objects within
the packs are still ordered as such, and so the compression we get in
the packs is just as good as it is for single-pack bitmaps. It's only at
the objects between pack boundaries that any runs of 1s or 0s might be
interrupted, but there are far fewer pack boundaries than objects, so it
doesn't seem to matter in practice.

Anyway, I think that you know all of that already (mostly because we
thought aloud together when I originally brought this up), but I figure
that this detail may be interesting for other readers, too.

> > @@ -74,7 +85,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
> >  		usage_with_options(builtin_multi_pack_index_write_usage,
> >  				   options);
> >
> > -	return write_midx_file(opts.object_dir, opts.flags);
> > +	return write_midx_file(opts.object_dir, opts.preferred_pack,
> > +			       opts.flags);
> >  }
>
> This has the same leak of "options" that I mentioned in the earlier
> patch.

Yup, thanks for pointing it out.

> > diff --git a/midx.c b/midx.c
> > index 971faa8cfc..46f55ff6cf 100644
> > --- a/midx.c
> > +++ b/midx.c
> > @@ -431,6 +431,24 @@ static int pack_info_compare(const void *_a, const void *_b)
> >  	return strcmp(a->pack_name, b->pack_name);
> >  }
> >
> > +static int lookup_idx_or_pack_name(struct pack_info *info,
> > +				   uint32_t nr,
> > +				   const char *pack_name)
> > +{
> > +	uint32_t lo = 0, hi = nr;
> > +	while (lo < hi) {
> > +		uint32_t mi = lo + (hi - lo) / 2;
> > +		int cmp = cmp_idx_or_pack_name(pack_name, info[mi].pack_name);
> > +		if (cmp < 0)
> > +			hi = mi;
> > +		else if (cmp > 0)
> > +			lo = mi + 1;
> > +		else
> > +			return mi;
> > +	}
> > +	return -1;
> > +}
>
> Could this just be replaced with bsearch() in the caller?

Great suggestion. Yes, it can be. FWIW, I think that I may have
originally thought that it couldn't be since we were comparing a fixed
string to an array of structs (each having a field which holds the value
we actually want to compare). But bsearch() always passes the key as the
first argument to the comparator, so this is possible to do.

> > +		git multi-pack-index --object-dir=objects \
> > +			write --preferred-pack=test-BC-$bc.idx 2>err &&
> > +		test_must_be_empty err &&
> > +
> > +		ofs=$(git show-index <objects/pack/test-BC-$bc.idx | grep $b |
> > +			cut -d" " -f1) &&
> > +		midx_expect_object_offset $b $ofs objects
> > +	)
>
> ...what we really care about is that the object came from BC. And we are
> just using the offset as a proxy for that. But doesn't "test-tool
> read-midx" give us the actual pack name? We could just be checking that.
>
> I also wondered if we should confirm that without the --preferred-pack
> option, we choose the other pack. I think it will always be true because
> the default order is to sort them lexically. A comment to that effect
> might be worth it (near the "set up two packs" comment).

Both great points, thanks.

> -Peff

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux