On Thu, Aug 29, 2024 at 03:27:13PM -0400, Jeff King wrote: > On Thu, Aug 29, 2024 at 02:57:08PM -0400, Taylor Blau wrote: > > > On Wed, Aug 28, 2024 at 02:33:56PM -0400, Jeff King wrote: > > > Is that right, though? It looks like the caller might pass in a > > > tempfile name like .../pack/multi-pack-index.d/tmp_midx_XXXXXX, > > > if we're in incremental mode. But we'll write directly to > > > "multi-pack-index-$hash.bitmap" in the same directory. I'm not sure to > > > what degree it matters, since that's the name we want in the long run. > > > But would we possibly overwrite an active-in-use file rather than doing > > > the atomic rename-into-place if we happened to generate the same midx? > > > > > > It feels like we should still respect the name the caller is using for > > > tempfiles, and then rename it into the correct spot at the end. > > > > In either case, we're going to write to a temporary file initialized by > > the pack-bitmap machinery and then rename() it into place at the end of > > bitmap_writer_finish(). > > OK, that addresses my worry, if we're always writing to a tempfile (and > I verified with some recent stracing that this is the case). So renaming > that into tmp_midx_XXXXXX.bitmap would just be a pointless extra layer > of renames. Yeah, I think we are OK here. > I do wonder if it's possible for us to generate a new different revindex > and bitmap pair for the same midx hash, and for a reader to see a > mismatched set for a moment. But that's an atomicity problem, and an > extra layer of renames is not going to solve that. What you're describing is basically the bug that we fixed in 95e8383bac (midx.c: make changing the preferred pack safe, 2022-01-25). That commit sought to ensure that there was no way to have a different reverse index (IOW, pseudo-pack order) for a given MIDX hash. It's possible that there is some case that we're not yet covering, but I can't think of it. The things that we care about are (a) the set of objects, (b) the set of packs those objects came from, and (c) the preferred pack. We don't store (c) directly, but we can infer it from the reverse index, which we do write within the RIDX chunk as you note below. > > On the caller side, in the non-incremental mode, we'll pass > > $GIT_DIR/objects/pack/multi-pack-index-$hash.bitmap as the location, > > write its contents into a temporary file, and then rename() it there. > > > > But in the incremental mode this series introduces, I think it would be > > a bug to pass a tmp_midx_XXXXXX file path there, since nobody would move > > it from tmp_midx_XXXXX-$HASH.bitmap into its final location. > > > > So I think what's written here with the fixup! patch is right (and > > should be squashed into 13/13 in the next round), but let me know if I'm > > missing something. > > What confused me is that write_midx_reverse_index() _does_ still take > midx_name, and respects it. But I think that is a bug! > > We do not usually even call that function, since modern midx's have a > RIDX chunk inside them instead of a separate file. But if you do this: > > # generate an extra pack > git commit --allow-empty -m foo > git repack -d > > # make an incremental midx with a .rev file; usually this ends up > # as a RIDX chunk, so we have to force it. > GIT_TEST_MIDX_WRITE_REV=1 git multi-pack-index write --incremental --bitmap > > then you'll end up with a tmp_midx_XXXXXX-*.rev file leftover in > multi-pack-index.d (since, as you note, nobody is moving those into > place). > > So probably write_midx_reverse_index() needs the same treatment to > derive its own filenames for the incremental case, and to drop the > midx_name parameter. > > Or I wonder if we could simply drop the code to write a separate .rev > file entirely? I don't think there's a reason anybody would want it. I would kind of like to get rid of it, but we use it in a couple of places in the test suite: $ git grep GIT_TEST_MIDX_WRITE_REV= t/t5327-multi-pack-bitmaps-rev.sh:GIT_TEST_MIDX_WRITE_REV=1 t/t5334-incremental-multi-pack-index.sh: GIT_TEST_MIDX_WRITE_REV=1 git multi-pack-index write --bitmap && And both of those tests are testing the old behavior, which we need an out-of-MIDX .rev file in order to do. Alternatively, we could store a test fixture in the repository that contains these files so we don't have to build them from scratch. But after the xz incident earlier this year, I am *very* wary of adding binary test fixtures into the tree, since they seem like an easy vector for attack. So I'm content to fix the bug here and keep the old code around for a while longer. The fix is indeed as simple as you described, which is nice ;-). Thanks, Taylor