On Tue, Mar 16, 2021 at 02:53:36AM +0000, SURA via GitGitGadget wrote: > In Gitee.com, I often use scripts to start a time-limited Not related to your patch, but I think this name falls afoul of Git's trademark policy. See: https://git-scm.com/trademark There's also some discussion in this thread: https://lore.kernel.org/git/20170202022655.2jwvudhvo4hmueaw@xxxxxxxxxxxxxxxxxxxxx/ > "git fetch". When the execution time of'git fetch' is too > long, send a signal (such as SIGINT) to the'git fetch' process > > But I found that after the process exits, there are some > files in the format of 'objects/pack/tmp_pack_XXXXXX' on the disk. > They are usually very large (some of them exceed 5G), and'git gc' > has no effect on them. This isn't quite true. "git gc" will clean up the temporary files, but only if the mtime is sufficiently old. The purpose here is to give a grace period to avoid deleting a file that is actively being written to. However, we use the same grace period that we use for deleting unreachable objects, which is absurdly long for this purpose: 2 weeks. Probably something like an hour would be more appropriate (since the mtime is updated on each write, this would imply a process not making forward progress). That said... > Obviously this is only a temporary solution, I think it should be fixed from git > > I found fetch will start a 'index-pack' sub-process, this sub-process > create the tmp_pack > 1. make 'index-pack' unlink tmp_pack when get signal > 2. make 'fetch' send signal to 'index-pack' when get signal I do think this is worth doing. One of the reasons we haven't traditionally cleaned up temporary packfiles is that the failed state is sometimes useful for analyzing what went wrong, or even recovering partial data. But that probably should not be the default mode of operation (i.e., a config option or environment variable should probably be able to turn it on for debugging). Looking at the implementation itself... > diff --git a/builtin/fetch.c b/builtin/fetch.c > index 0b90de87c7a2..a87efa23ceb5 100644 > --- a/builtin/fetch.c > +++ b/builtin/fetch.c > @@ -224,8 +224,18 @@ static void unlock_pack(void) > transport_unlock_pack(gsecondary); > } > > +static void send_signo_to_index_pack(int signo) > +{ > + if (gtransport && gtransport->index_pack_pid > 0) > + kill(gtransport->index_pack_pid, signo); > + > + if (gsecondary && gsecondary->index_pack_pid > 0) > + kill(gsecondary->index_pack_pid, signo); > +} > + > static void unlock_pack_on_signal(int signo) > { > + send_signo_to_index_pack(signo); > unlock_pack(); > sigchain_pop(signo); > raise(signo); We'd probably want to also trigger this if we just hit die(), I'd think. We have a system for killing process children when we exit unexpectedly. I think just setting the "clean_on_exit" flag on the index-pack child_process struct would turn this into a one-liner. Likewise, we'd probably want to do this from receive-pack, too (so that pushes don't accumulate temporary packfiles on the receiving side). > diff --git a/builtin/index-pack.c b/builtin/index-pack.c > index bad57488079c..27d1ebba746e 100644 > --- a/builtin/index-pack.c > +++ b/builtin/index-pack.c > [...] > +static void remove_tmp_pack(void) > +{ > + if (tmp_pack_name) > + unlink_or_warn(tmp_pack_name); > +} > + > +static void remove_tmp_pack_on_signal(int signo) > +{ > + remove_tmp_pack(); > + sigchain_pop(signo); > + raise(signo); > +} Likewise, we have a tempfile cleanup system already. I think this hunk: > @@ -336,6 +339,7 @@ static const char *open_pack_file(const char *pack_name) > output_fd = odb_mkstemp(&tmp_file, > "pack/tmp_pack_XXXXXX"); > pack_name = strbuf_detach(&tmp_file, NULL); > + tmp_pack_name = pack_name; ...can just call register_tempfile(). It should also record the result so that we don't try to unlink() it after we've already moved it away from its temporary name (though it's fairly unlikely for somebody else to have used the name in the interim). I think you'd want to do the same for the tmp_idx_* files, too. Likewise for ".rev" files we create starting in v2.31. I think it would also make sense in create_tmp_packfile(), which is used during repacking (a different problem space, but really the same thing: if repacking fails for some reason, we probably shouldn't leave a useless gigantic half-finished packfile on disk). We should possibly also do so for tmp_obj_* files. Those can be written for a fetch or push via unpack-objects (as well as normal local commands). They're not usually as big as a pack, obviously, but I think the same principle applies. > [...] It would be nice to see some tests covering this functionality, too. Reproducing it with signals is likely to be racy and not worth it. But I think that right now index-pack reading a bogus pack (say, one that fails fsck checks) will leave the tmp_pack_* on disk. And it would not if we cleanup tempfiles (again, this would be on any exit, not just signal death, but I think that is what we'd want, and also what register_tempfile() will do). -Peff