For me, two "git gc" on a same repo fail to run: fatal: gc is already running on machine 'WIN-blah' pid 40304 (use --force if not) If you're already colliding on this, then I don't see why you can't use a normal looking name without a randomized string like "tmp_garbagecollecting", so that each execution would at least overwrite the same location. In this case --force could append _1 probably. -Vitaly On Fri, Dec 20, 2024 at 1:05 AM Jeff King <peff@xxxxxxxx> wrote: > > On Thu, Dec 19, 2024 at 03:17:01AM -0800, Junio C Hamano wrote: > > > Boomman <boomman37@xxxxxxxxx> writes: > > > > > Yes, if the behavior in case of running out of disk space is to just > > > leave the malformed file there, it stands to reason that cleaning up > > > those malformed files should be the first operation to do for gc. > > > > It is misleading to call them malformed, isn't it? When a Git > > process creates a packfile (or loose object file for that matter), > > they are written under these tmp_* names. When the processes die > > without finalizing these (either removing or renaming into their > > final names), they are left behind, and it would be better if we can > > remove it _before_ another process wants to consume more disk space. > > We usually automatically clean up our tempfiles if we encounter an > error, but don't do so for partially written packs. I think this is > mostly historical, though occasionally it can be useful for debugging > (e.g., indexing a pack coming over the network). > > It might make sense to register them as tempfiles in the usual way, > possibly with an environment variable option to ask for them to be kept > (for debugging). > > That's not foolproof, since a process can die without cleaning up after > itself (e.g., on system crash). But it would mean that a repeatedly > failing "git repack -ad" does not fill up the disk. And the decision of > when to clean up tempfiles in git-gc is less important. > > > But the issue is how you tell which one of these "malformed" files > > are still being written and will be finalized, and which ones are > > leftover ones. You want to remove the latter without molesting the > > former. And you want to do so in a portable way, possibly even > > across the network file systems. > > Yeah, I think there are two issues being discussed in this thread: > > - when to clean up leftover tempfiles > > - how to decide which tempfiles are leftover > > The second one is what the OP mentioned for locking. But not only does > that have portability questions, I'm not sure it is sufficient. Would we > ever write tmp_pack_*, complete our process, and then expect our caller > to do something with it (meaning there's a race where no process is > holding the lock)? > > I'm not sure. We definitely write "tmp" packfiles via pack-objects and > expect git-repack to move them to their final names. I think we use a > slightly different name ("tmp-<pid>-pack-*"), but arguably we should > consider cleaning up stale versions of those, too. > > -Peff