On Wed, Jun 22, 2022 at 10:53:24AM -0700, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes: > > > On Wed, Jun 22 2022, haoyurenzhuxia@xxxxxxxxx wrote: > > > >> From: Xia XiaoWen <haoyurenzhuxia@xxxxxxxxx> > >> > >> The command: `git multi-pack-index write --bitmap` will create 3 > >> files in `objects/pack/`: > >> * multi-pack-index > >> * multi-pack-index-*.bitmap > >> * multi-pack-index-*.rev > >> > >> But if the command is terminated by the user (such as Ctl-C) or > >> the system, the midx reverse index file (`multi-pack-index-*.rev`) > >> is not removed and still exists in `objects/pack/`: > >> > >> $ GIT_TEST_MIDX_WRITE_REV=1 git multi-pack-index write --bitmap > >> Selecting bitmap commits: 133020, done. > >> Building bitmaps: 0% (3/331) > >> ^C^C > >> > >> $ tree objects/pack/ > >> objects/pack/ > >> ├── multi-pack-index-3b048d1b965842cd866e10b6ec1a3035dbede0a5.rev > >> ├── pack-b7d425f1b01727d5f364f5d9fbab2d1900fcd5c0.idx > >> └── pack-b7d425f1b01727d5f364f5d9fbab2d1900fcd5c0.pack > >> ... > > Also, the commit message doesn't really say *why*, i.e. in cmd_repack() > > we've suffered from this already, but don't we have "git gc" cleaning > > these up? Maybe not (I didn't check), but maybe that was the previous > > assumption... > > Exactly my thought. Well said. `repack` relies on `git multi-pack-index write --bitmap` to do the actual work here. A few things worth noting: - the MIDX file itself is written using a lock_file, so it is atomically moved into place, and the temporary file is either removed, or cleaned up automatically with a sigchain handler on process death - the bitmap (written in bitmap_writer_finish(), which is the path for both single- and multi-pack bitmaps) is written to a temporary file and moved into place after the bitmaps are written. ...but this temporary file isn't automatically cleaned up, so it could stick around after process death. Luckily the race window here is pretty small, since all of the bitmaps have been computed already and are held in memory. This is probably worth a cleanup on its own, too. - unless GIT_TEST_MIDX_WRITE_REV=1 is in your environment, we won't *write* a .rev file, hence this is pretty rare to deal with in practice. So I think there are two things worth doing here: - make sure that the temporary file used to stage the .bitmap is a lock_file - use a temporary file to stage the .rev file (when forced to write one), and ensure that that too is a lock_file This is mostly the same as Ævar's original suggestion, just with a larger scope. But I agree that it's the right direction nonetheless. > The _only_ case that might matter is if the next "write --bitmap" gets > confused ir there is a leftover file(s) from a previous run, but then > such a bug should be fixed on the reading side, I would think. It shouldn't in practice. We'll recognize that the .rev file is garbage if it's checksum doesn't match the MIDX's, and we squashed the bug where changing the object *order* (but not the objects themselves) doesn't change the MIDX checksum (it does now). We won't write over an existing .rev file with the right name, but we'll always prefer to read the .rev data from the MIDX itself, if it exists. Thanks, Taylor