Re: [RFC PATCH] midx.c: clean up .rev file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 22, 2022 at 10:53:24AM -0700, Junio C Hamano wrote:
> Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes:
>
> > On Wed, Jun 22 2022, haoyurenzhuxia@xxxxxxxxx wrote:
> >
> >> From: Xia XiaoWen <haoyurenzhuxia@xxxxxxxxx>
> >>
> >> The command: `git multi-pack-index write --bitmap` will create 3
> >> files in `objects/pack/`:
> >>     * multi-pack-index
> >>     * multi-pack-index-*.bitmap
> >>     * multi-pack-index-*.rev
> >>
> >> But if the command is terminated by the user (such as Ctl-C) or
> >> the system, the midx reverse index file (`multi-pack-index-*.rev`)
> >> is not removed and still exists in `objects/pack/`:
> >>
> >>     $ GIT_TEST_MIDX_WRITE_REV=1 git multi-pack-index write --bitmap
> >>     Selecting bitmap commits: 133020, done.
> >>     Building bitmaps:   0% (3/331)
> >>     ^C^C
> >>
> >>     $ tree objects/pack/
> >>     objects/pack/
> >>     ├── multi-pack-index-3b048d1b965842cd866e10b6ec1a3035dbede0a5.rev
> >>     ├── pack-b7d425f1b01727d5f364f5d9fbab2d1900fcd5c0.idx
> >>     └── pack-b7d425f1b01727d5f364f5d9fbab2d1900fcd5c0.pack
> >> ...
> > Also, the commit message doesn't really say *why*, i.e. in cmd_repack()
> > we've suffered from this already, but don't we have "git gc" cleaning
> > these up? Maybe not (I didn't check), but maybe that was the previous
> > assumption...
>
> Exactly my thought.  Well said.

`repack` relies on `git multi-pack-index write --bitmap` to do the
actual work here. A few things worth noting:

  - the MIDX file itself is written using a lock_file, so it is
    atomically moved into place, and the temporary file is either
    removed, or cleaned up automatically with a sigchain handler on
    process death

  - the bitmap (written in bitmap_writer_finish(), which is the path for
    both single- and multi-pack bitmaps) is written to a temporary file
    and moved into place after the bitmaps are written.

    ...but this temporary file isn't automatically cleaned up, so it
    could stick around after process death. Luckily the race window here
    is pretty small, since all of the bitmaps have been computed already
    and are held in memory.

    This is probably worth a cleanup on its own, too.

  - unless GIT_TEST_MIDX_WRITE_REV=1 is in your environment, we won't
    *write* a .rev file, hence this is pretty rare to deal with in
    practice.

So I think there are two things worth doing here:

  - make sure that the temporary file used to stage the .bitmap is a
    lock_file
  - use a temporary file to stage the .rev file (when forced to write
    one), and ensure that that too is a lock_file

This is mostly the same as Ævar's original suggestion, just with a
larger scope. But I agree that it's the right direction nonetheless.

> The _only_ case that might matter is if the next "write --bitmap" gets
> confused ir there is a leftover file(s) from a previous run, but then
> such a bug should be fixed on the reading side, I would think.

It shouldn't in practice. We'll recognize that the .rev file is garbage
if it's checksum doesn't match the MIDX's, and we squashed the bug where
changing the object *order* (but not the objects themselves) doesn't
change the MIDX checksum (it does now).

We won't write over an existing .rev file with the right name, but we'll
always prefer to read the .rev data from the MIDX itself, if it exists.

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux