Re: [PATCH] tmp-objdir: do not opendir() when handling a signal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey Peff,

On 27 Sep 2022, at 7:44, Jeff King wrote:

> On Mon, Sep 26, 2022 at 11:53:03PM +0000, John Cai via GitGitGadget wrote:
>
>> One place we call tmp_objdir_create() is in git-receive-pack, where
>> we create a temporary quarantine directory "incoming". Incoming
>> objects will be written to this directory before they get moved to
>> the object directory.
>>
>> We have observed this code leading to a deadlock:
>>
>> 	Thread 1 (Thread 0x7f621ba0b200 (LWP 326305)):
>> 	#0  __lll_lock_wait_private (futex=futex@entry=0x7f621bbf8b80
>> 		<main_arena>) at ./lowlevellock.c:35
>> 	#1  0x00007f621baa635b in __GI___libc_malloc
>> 		(bytes=bytes@entry=32816) at malloc.c:3064
>> 	#2  0x00007f621bae9f49 in __alloc_dir (statp=0x7fff2ea7ed60,
>> 		flags=0, close_fd=true, fd=5)
>> 		at ../sysdeps/posix/opendir.c:118
>> 	#3  opendir_tail (fd=5) at ../sysdeps/posix/opendir.c:69
>> 	#4  __opendir (name=<optimized out>)
>> 		at ../sysdeps/posix/opendir.c:92
>> 	#5  0x0000557c19c77de1 in remove_dir_recurse ()
>> 	#6  0x0000557c19d81a4f in remove_tmp_objdir_on_signal ()
>> 	#7  <signal handler called>
>
> Yuck. Your analysis looks right, and certainly opendir() can't really
> work without allocating memory for the pointer-to-DIR.
>
>> To prevent this, add a flag REMOVE_DIR_SIGNAL that allows
>> remove_dir_recurse() to know that a signal is being handled and avoid
>> calling opendir(3). One implication of this change is that when
>> signal handling, the temporary directory may not get cleaned up
>> properly.
>
> It's not really "may not" here, is it? It will never get cleaned up on a
> signal now. I don't think remove_dir_recursively() will try to rmdir()
> in this case. But even if it did, we'll always have a "pack"
> subdirectory (minus the small race before we've created it).
>
> That's unfortunate, but I don't think we really have a portable
> alternative. We can't keep an exact list of files to be deleted, because
> some of them will be created by sub-processes. We could perhaps exec a
> helper that does the deletion, but that seems like a race and
> portability nightmare. On Linux, we could probably use open() and
> getdents64() to traverse, but obviously that won't work everywhere. It
> _might_ be worth having some kind of compat/ wrapper here, to let
> supported systems do as good a job as they can. But it's not like there
> aren't already cases where we might leave the tmp-objdir directory
> around (say, SIGKILL), so this is really just extending the existing
> problem to more signals.
>
> I was going to suggest we should do a better job of cleaning up these
> directories via git-gc. But it looks like b3cecf49ea (tmp-objdir: new
> API for creating temporary writable databases, 2021-12-06) changed the
> default name such that a regular git-gc should do so. So I think we're
> covered there.

I was wondering about this as well. Thanks for checking on this--that's
reassuring.
>
> The main case we care about is normal exit when index-pack or a hook
> sees an error, in which case we should still be cleaning up via the
> atexit() handler.
>
> So I think your patch is going in the right direction, but...
>
>>  static int remove_dir_recurse(struct strbuf *path, int flag, int *kept_up)
>>  {
>> -	DIR *dir;
>> +	DIR *dir = NULL;
>>  	struct dirent *e;
>>  	int ret = 0, original_len = path->len, len, kept_down = 0;
>>  	int only_empty = (flag & REMOVE_DIR_EMPTY_ONLY);
>> @@ -3261,7 +3261,10 @@ static int remove_dir_recurse(struct strbuf *path, int flag, int *kept_up)
>>  	}
>>
>>  	flag &= ~REMOVE_DIR_KEEP_TOPLEVEL;
>> -	dir = opendir(path->buf);
>> +
>> +	if ((flag & REMOVE_DIR_SIGNAL) == 0)
>> +		dir = opendir(path->buf);
>> +
>>  	if (!dir) {
>>  		if (errno == ENOENT)
>>  			return keep_toplevel ? -1 : 0;
>
> We skip calling opendir() entirely, so "dir" will still be NULL. But we
> immediately start looking at errno, which will have some undefined value
> (based on some previous syscall).
>
> If we set "errno" to "EACCES" in this case, then we'd at least hit the
> rmdir() below:
>
>          if (!dir) {
>                   if (errno == ENOENT)
>                           return keep_toplevel ? -1 : 0;
>                   else if (errno == EACCES && !keep_toplevel)
>                           /*
>                            * An empty dir could be removable even if it
>                            * is unreadable:
>                            */
>                           return rmdir(path->buf);
>                   else
>                           return -1;
>           }
>
> but we know it won't really do anything for our proposed caller, since
> it will have files inside the directory that need to be removed before
> rmdir() can work.

yeah, I suppose the only case it would help is if the directory is already
empty.

>
> Moreover, if you were to combine REMOVE_DIR_SIGNAL with
> REMOVE_DIR_KEEP_NESTED_GIT, I suspect that the call to
> resolve_gitlink_ref() would end up with similar deadlocks. Obviously
> nobody is proposing to do that, but it is a pitfall in the API.
>
> So all of that makes me think we should not add a new flag here, but
> instead just avoid calling the function entirely from
> tmp_objdir_destroy_1().
>
> But then we can observe that tmp_objdir_destroy_1() is basically doing
> nothing if on_signal is set. So there is really no point in setting up
> the signal handler at all. We should just set up the atexit() handler.
> I.e., something like:
>
> diff --git a/tmp-objdir.c b/tmp-objdir.c
> index a8be92bca1..10549e95db 100644
> --- a/tmp-objdir.c
> +++ b/tmp-objdir.c
> @@ -169,7 +169,6 @@ struct tmp_objdir *tmp_objdir_create(const char *prefix)
>  	the_tmp_objdir = t;
>  	if (!installed_handlers) {
>  		atexit(remove_tmp_objdir);
> -		sigchain_push_common(remove_tmp_objdir_on_signal);
>  		installed_handlers++;
>  	}

This makes sense and is a clean solution. I guess the only case where we would
benefit in calling into remove_tmp_objdir_on_signal() is if the temp dir exists
but is empty. I'm not sure how often this would happen to make it worth it?
Probably not...

>
>
> with the commit message explaining that we can't do the cleanup in a
> portable and signal-safe way, so we just punt on the whole concept.
>
> There's also some minor cleanup we could do elsewhere to drop the
> "on_signal" argument (which can come as part of the same patch, or on
> top).
>
> -Peff




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux