Re: git gc does not clean tmp_pack* files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



For me, two "git gc" on a same repo fail to run:
fatal: gc is already running on machine 'WIN-blah' pid 40304 (use
--force if not)

If you're already colliding on this, then I don't see why you can't
use a normal looking name without a randomized string like
"tmp_garbagecollecting", so that each execution would at least
overwrite the same location. In this case --force could append _1
probably.

-Vitaly


On Fri, Dec 20, 2024 at 1:05 AM Jeff King <peff@xxxxxxxx> wrote:
>
> On Thu, Dec 19, 2024 at 03:17:01AM -0800, Junio C Hamano wrote:
>
> > Boomman <boomman37@xxxxxxxxx> writes:
> >
> > > Yes, if the behavior in case of running out of disk space is to just
> > > leave the malformed file there, it stands to reason that cleaning up
> > > those malformed files should be the first operation to do for gc.
> >
> > It is misleading to call them malformed, isn't it?  When a Git
> > process creates a packfile (or loose object file for that matter),
> > they are written under these tmp_* names.  When the processes die
> > without finalizing these (either removing or renaming into their
> > final names), they are left behind, and it would be better if we can
> > remove it _before_ another process wants to consume more disk space.
>
> We usually automatically clean up our tempfiles if we encounter an
> error, but don't do so for partially written packs. I think this is
> mostly historical, though occasionally it can be useful for debugging
> (e.g., indexing a pack coming over the network).
>
> It might make sense to register them as tempfiles in the usual way,
> possibly with an environment variable option to ask for them to be kept
> (for debugging).
>
> That's not foolproof, since a process can die without cleaning up after
> itself (e.g., on system crash). But it would mean that a repeatedly
> failing "git repack -ad" does not fill up the disk. And the decision of
> when to clean up tempfiles in git-gc is less important.
>
> > But the issue is how you tell which one of these "malformed" files
> > are still being written and will be finalized, and which ones are
> > leftover ones.  You want to remove the latter without molesting the
> > former.  And you want to do so in a portable way, possibly even
> > across the network file systems.
>
> Yeah, I think there are two issues being discussed in this thread:
>
>   - when to clean up leftover tempfiles
>
>   - how to decide which tempfiles are leftover
>
> The second one is what the OP mentioned for locking. But not only does
> that have portability questions, I'm not sure it is sufficient. Would we
> ever write tmp_pack_*, complete our process, and then expect our caller
> to do something with it (meaning there's a race where no process is
> holding the lock)?
>
> I'm not sure. We definitely write "tmp" packfiles via pack-objects and
> expect git-repack to move them to their final names. I think we use a
> slightly different name ("tmp-<pid>-pack-*"), but arguably we should
> consider cleaning up stale versions of those, too.
>
> -Peff





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux