On Mon, Mar 29, 2021 at 08:00:59AM -0400, Jeff King wrote: > I think in the long run we may want to add a midx chunk that gives the > order of the packs (and likewise allow the caller of "midx write" to > specify the exact order), since that may allow correlating locality > between history and object order within the .rev/.bitmap files. > > But I think this is a nice stopping point for this series, since we're > not having to introduce any new on-disk formats to do it, and it seems > to give pretty good results in practice. I guess we'll have to support > --preferred-pack forever, but that's OK. Even if we do eventually > support arbitrary orderings, it's just a simple subset of that > functionality. To add a little bit of extra detail, I think what you're getting at here is that it would be nice to let the order of the packs be dictated by mtime, not the order they appear in the MIDX (which is lexicographic by their hash, and thus effectively random). The reason there being the same as you pointed out in https://lore.kernel.org/git/YDRdmh8oS5%2Fxq4rB@xxxxxxxxxxxxxxxxxxxxxxx/ which is that it effectively would lay objects out from newest to oldest. But, there's a problem, which is that the MIDX doesn't store the packs' mtimes. That's fine for writing, since we can just look that information up ourselves. But the reading side can get broken. That's because the reader also has to know the pack order to go from MIDX- to bit-position. So if a third party goes and touches some of the packs after the .rev file was written, then the reader is going to think the packs ought to appear in a different order than they actually do. So relying on having to look up the mtimes again later on isn't good enough. There are two solutions to the problem: - You could write the mtimes in the MIDX itself. This would give you a single point of reference, and resolve the TOCTOU race I just described. - Or, you could forget about mtimes entirely and let the MIDX dictate the pack ordering itself. That resolves the race in a similar-but-different way. Of the two, I prefer the latter, but I think it introduces functionality that we don't necessarily need yet. That's because the objects within the packs are still ordered as such, and so the compression we get in the packs is just as good as it is for single-pack bitmaps. It's only at the objects between pack boundaries that any runs of 1s or 0s might be interrupted, but there are far fewer pack boundaries than objects, so it doesn't seem to matter in practice. Anyway, I think that you know all of that already (mostly because we thought aloud together when I originally brought this up), but I figure that this detail may be interesting for other readers, too. > > @@ -74,7 +85,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv) > > usage_with_options(builtin_multi_pack_index_write_usage, > > options); > > > > - return write_midx_file(opts.object_dir, opts.flags); > > + return write_midx_file(opts.object_dir, opts.preferred_pack, > > + opts.flags); > > } > > This has the same leak of "options" that I mentioned in the earlier > patch. Yup, thanks for pointing it out. > > diff --git a/midx.c b/midx.c > > index 971faa8cfc..46f55ff6cf 100644 > > --- a/midx.c > > +++ b/midx.c > > @@ -431,6 +431,24 @@ static int pack_info_compare(const void *_a, const void *_b) > > return strcmp(a->pack_name, b->pack_name); > > } > > > > +static int lookup_idx_or_pack_name(struct pack_info *info, > > + uint32_t nr, > > + const char *pack_name) > > +{ > > + uint32_t lo = 0, hi = nr; > > + while (lo < hi) { > > + uint32_t mi = lo + (hi - lo) / 2; > > + int cmp = cmp_idx_or_pack_name(pack_name, info[mi].pack_name); > > + if (cmp < 0) > > + hi = mi; > > + else if (cmp > 0) > > + lo = mi + 1; > > + else > > + return mi; > > + } > > + return -1; > > +} > > Could this just be replaced with bsearch() in the caller? Great suggestion. Yes, it can be. FWIW, I think that I may have originally thought that it couldn't be since we were comparing a fixed string to an array of structs (each having a field which holds the value we actually want to compare). But bsearch() always passes the key as the first argument to the comparator, so this is possible to do. > > + git multi-pack-index --object-dir=objects \ > > + write --preferred-pack=test-BC-$bc.idx 2>err && > > + test_must_be_empty err && > > + > > + ofs=$(git show-index <objects/pack/test-BC-$bc.idx | grep $b | > > + cut -d" " -f1) && > > + midx_expect_object_offset $b $ofs objects > > + ) > > ...what we really care about is that the object came from BC. And we are > just using the offset as a proxy for that. But doesn't "test-tool > read-midx" give us the actual pack name? We could just be checking that. > > I also wondered if we should confirm that without the --preferred-pack > option, we choose the other pack. I think it will always be true because > the default order is to sort them lexically. A comment to that effect > might be worth it (near the "set up two packs" comment). Both great points, thanks. > -Peff Thanks, Taylor