Re: [PATCH] tmp-objdir: do not opendir() when handling a signal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 26, 2022 at 11:53:03PM +0000, John Cai via GitGitGadget wrote:

> One place we call tmp_objdir_create() is in git-receive-pack, where
> we create a temporary quarantine directory "incoming". Incoming
> objects will be written to this directory before they get moved to
> the object directory.
> 
> We have observed this code leading to a deadlock:
> 
> 	Thread 1 (Thread 0x7f621ba0b200 (LWP 326305)):
> 	#0  __lll_lock_wait_private (futex=futex@entry=0x7f621bbf8b80
> 		<main_arena>) at ./lowlevellock.c:35
> 	#1  0x00007f621baa635b in __GI___libc_malloc
> 		(bytes=bytes@entry=32816) at malloc.c:3064
> 	#2  0x00007f621bae9f49 in __alloc_dir (statp=0x7fff2ea7ed60,
> 		flags=0, close_fd=true, fd=5)
> 		at ../sysdeps/posix/opendir.c:118
> 	#3  opendir_tail (fd=5) at ../sysdeps/posix/opendir.c:69
> 	#4  __opendir (name=<optimized out>)
> 		at ../sysdeps/posix/opendir.c:92
> 	#5  0x0000557c19c77de1 in remove_dir_recurse ()
> 	#6  0x0000557c19d81a4f in remove_tmp_objdir_on_signal ()
> 	#7  <signal handler called>

Yuck. Your analysis looks right, and certainly opendir() can't really
work without allocating memory for the pointer-to-DIR.

> To prevent this, add a flag REMOVE_DIR_SIGNAL that allows
> remove_dir_recurse() to know that a signal is being handled and avoid
> calling opendir(3). One implication of this change is that when
> signal handling, the temporary directory may not get cleaned up
> properly.

It's not really "may not" here, is it? It will never get cleaned up on a
signal now. I don't think remove_dir_recursively() will try to rmdir()
in this case. But even if it did, we'll always have a "pack"
subdirectory (minus the small race before we've created it).

That's unfortunate, but I don't think we really have a portable
alternative. We can't keep an exact list of files to be deleted, because
some of them will be created by sub-processes. We could perhaps exec a
helper that does the deletion, but that seems like a race and
portability nightmare. On Linux, we could probably use open() and
getdents64() to traverse, but obviously that won't work everywhere. It
_might_ be worth having some kind of compat/ wrapper here, to let
supported systems do as good a job as they can. But it's not like there
aren't already cases where we might leave the tmp-objdir directory
around (say, SIGKILL), so this is really just extending the existing
problem to more signals.

I was going to suggest we should do a better job of cleaning up these
directories via git-gc. But it looks like b3cecf49ea (tmp-objdir: new
API for creating temporary writable databases, 2021-12-06) changed the
default name such that a regular git-gc should do so. So I think we're
covered there.

The main case we care about is normal exit when index-pack or a hook
sees an error, in which case we should still be cleaning up via the
atexit() handler.

So I think your patch is going in the right direction, but...

>  static int remove_dir_recurse(struct strbuf *path, int flag, int *kept_up)
>  {
> -	DIR *dir;
> +	DIR *dir = NULL;
>  	struct dirent *e;
>  	int ret = 0, original_len = path->len, len, kept_down = 0;
>  	int only_empty = (flag & REMOVE_DIR_EMPTY_ONLY);
> @@ -3261,7 +3261,10 @@ static int remove_dir_recurse(struct strbuf *path, int flag, int *kept_up)
>  	}
>  
>  	flag &= ~REMOVE_DIR_KEEP_TOPLEVEL;
> -	dir = opendir(path->buf);
> +
> +	if ((flag & REMOVE_DIR_SIGNAL) == 0)
> +		dir = opendir(path->buf);
> +
>  	if (!dir) {
>  		if (errno == ENOENT)
>  			return keep_toplevel ? -1 : 0;

We skip calling opendir() entirely, so "dir" will still be NULL. But we
immediately start looking at errno, which will have some undefined value
(based on some previous syscall).

If we set "errno" to "EACCES" in this case, then we'd at least hit the
rmdir() below:

         if (!dir) {
                  if (errno == ENOENT)
                          return keep_toplevel ? -1 : 0;
                  else if (errno == EACCES && !keep_toplevel)
                          /*
                           * An empty dir could be removable even if it
                           * is unreadable:
                           */
                          return rmdir(path->buf);
                  else
                          return -1;
          }

but we know it won't really do anything for our proposed caller, since
it will have files inside the directory that need to be removed before
rmdir() can work.

Moreover, if you were to combine REMOVE_DIR_SIGNAL with
REMOVE_DIR_KEEP_NESTED_GIT, I suspect that the call to
resolve_gitlink_ref() would end up with similar deadlocks. Obviously
nobody is proposing to do that, but it is a pitfall in the API.

So all of that makes me think we should not add a new flag here, but
instead just avoid calling the function entirely from
tmp_objdir_destroy_1().

But then we can observe that tmp_objdir_destroy_1() is basically doing
nothing if on_signal is set. So there is really no point in setting up
the signal handler at all. We should just set up the atexit() handler.
I.e., something like:

diff --git a/tmp-objdir.c b/tmp-objdir.c
index a8be92bca1..10549e95db 100644
--- a/tmp-objdir.c
+++ b/tmp-objdir.c
@@ -169,7 +169,6 @@ struct tmp_objdir *tmp_objdir_create(const char *prefix)
 	the_tmp_objdir = t;
 	if (!installed_handlers) {
 		atexit(remove_tmp_objdir);
-		sigchain_push_common(remove_tmp_objdir_on_signal);
 		installed_handlers++;
 	}
 

with the commit message explaining that we can't do the cleanup in a
portable and signal-safe way, so we just punt on the whole concept.

There's also some minor cleanup we could do elsewhere to drop the
"on_signal" argument (which can come as part of the same patch, or on
top).

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux