Re: signals not reliably interrupting io_uring_enter anymore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/3/20 7:13 PM, Andres Freund wrote:
> Hi, 
> 
> On July 3, 2020 5:48:21 PM PDT, Jens Axboe <axboe@xxxxxxxxx> wrote:
>> On 7/3/20 6:15 PM, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2020-07-03 17:00:49 -0700, Andres Freund wrote:
>>>> I haven't yet fully analyzed the problem, but after updating to
>>>> cdd3bb54332f82295ed90cd0c09c78cd0c0ee822 io_uring using postgres
>> does
>>>> not work reliably anymore.
>>>>
>>>> The symptom is that io_uring_enter(IORING_ENTER_GETEVENTS) isn't
>>>> interrupted by signals anymore. The signal handler executes, but
>>>> afterwards the syscall is restarted. Previously io_uring_enter
>> reliably
>>>> returned EINTR in that case.
>>>>
>>>> Currently postgres relies on signals interrupting io_uring_enter().
>> We
>>>> probably can find a way to not do so, but it'd not be entirely
>> trivial.
>>>>
>>>> I suspect the issue is
>>>>
>>>> commit ce593a6c480a22acba08795be313c0c6d49dd35d (tag:
>> io_uring-5.8-2020-07-01, linux-block/io_uring-5.8)
>>>> Author: Jens Axboe <axboe@xxxxxxxxx>
>>>> Date:   2020-06-30 12:39:05 -0600
>>>>
>>>>     io_uring: use signal based task_work running
>>>>
>>>> as that appears to have changed the error returned by
>>>> io_uring_enter(GETEVENTS) after having been interrupted by a signal
>> from
>>>> EINTR to ERESTARTSYS.
>>>>
>>>>
>>>> I'll check to make sure that the issue doesn't exist before the
>> above
>>>> commit.
>>>
>>> Indeed, on cd77006e01b3198c75fb7819b3d0ff89709539bb the PG issue
>> doesn't
>>> exist, which pretty much confirms that the above commit is the issue.
>>>
>>> What was the reason for changing EINTR to ERESTARTSYS in the above
>>> commit? I assume trying to avoid returning spurious EINTRs to
>> userland?
>>
>> Yeah, for when it's running task_work. I wonder if something like the
>> below will do the trick?
>>
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 700644a016a7..0efa73d78451 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -6197,11 +6197,11 @@ static int io_cqring_wait(struct io_ring_ctx
>> *ctx, int min_events,
>> 	do {
>> 		prepare_to_wait_exclusive(&ctx->wait, &iowq.wq,
>> 						TASK_INTERRUPTIBLE);
>> -		/* make sure we run task_work before checking for signals */
>> -		if (current->task_works)
>> -			task_work_run();
>> 		if (signal_pending(current)) {
>> -			ret = -ERESTARTSYS;
>> +			if (current->task_works)
>> +				ret = -ERESTARTSYS;
>> +			else
>> +				ret = -EINTR;
>> 			break;
>> 		}
>> 		if (io_should_wake(&iowq, false))
>> @@ -6210,7 +6210,7 @@ static int io_cqring_wait(struct io_ring_ctx
>> *ctx, int min_events,
>> 	} while (1);
>> 	finish_wait(&ctx->wait, &iowq.wq);
>>
>> -	restore_saved_sigmask_unless(ret == -ERESTARTSYS);
>> +	restore_saved_sigmask_unless(ret == -EINTR);
>>
>> 	return READ_ONCE(rings->cq.head) == READ_ONCE(rings->cq.tail) ? ret :
>> 0;
>> }
> 
> I'll try in a bit. Suspect however that there'd be trouble if there
> were both an actual signal and task work pending?

Yes, I have that worry too. We'd really need to check if we have an
actual signal pending - if we do, we still do -EINTR. If not, then we
just do -ERESTARTSYS and restart the system call after task_work has
been completed. Half-assed approach below, I suspect this won't _really_
work without appropriate locking. Which would be unfortunate.

Either that, or we'd need to know if an actual signal was delivered when
we get re-entered due to returning -ERESTARTSYS. If it was just
task_work being run, then we're fine. But if an actual signal was
pending, then we'd need to return -EINTR.

CC'ing Oleg to see if he has any good ideas here.


diff --git a/fs/io_uring.c b/fs/io_uring.c
index 700644a016a7..715d56144f15 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6197,11 +6197,11 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 	do {
 		prepare_to_wait_exclusive(&ctx->wait, &iowq.wq,
 						TASK_INTERRUPTIBLE);
-		/* make sure we run task_work before checking for signals */
-		if (current->task_works)
-			task_work_run();
 		if (signal_pending(current)) {
-			ret = -ERESTARTSYS;
+			if (has_pending_signal(current))
+				ret = -EINTR;
+			else
+				ret = -ERESTARTSYS;
 			break;
 		}
 		if (io_should_wake(&iowq, false))
@@ -6210,7 +6210,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 	} while (1);
 	finish_wait(&ctx->wait, &iowq.wq);
 
-	restore_saved_sigmask_unless(ret == -ERESTARTSYS);
+	restore_saved_sigmask_unless(ret == -EINTR);
 
 	return READ_ONCE(rings->cq.head) == READ_ONCE(rings->cq.tail) ? ret : 0;
 }
diff --git a/include/linux/signal.h b/include/linux/signal.h
index 6bb1a3f0258c..8ef23b0bb406 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -321,6 +321,7 @@ static inline void disallow_signal(int sig)
 extern struct kmem_cache *sighand_cachep;
 
 extern bool unhandled_signal(struct task_struct *tsk, int sig);
+extern bool has_pending_signal(struct task_struct *tsk);
 
 /*
  * In POSIX a signal is sent either to a specific thread (Linux task)
diff --git a/kernel/signal.c b/kernel/signal.c
index ee22ec78fd6d..8923872e5228 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -151,12 +151,16 @@ static inline bool has_pending_signals(sigset_t *signal, sigset_t *blocked)
 
 #define PENDING(p,b) has_pending_signals(&(p)->signal, (b))
 
+inline bool has_pending_signal(struct task_struct *t)
+{
+	return PENDING(&t->pending, &t->blocked) ||
+		PENDING(&t->signal->shared_pending, &t->blocked);
+}
+
 static bool recalc_sigpending_tsk(struct task_struct *t)
 {
 	if ((t->jobctl & (JOBCTL_PENDING_MASK | JOBCTL_TRAP_FREEZE)) ||
-	    PENDING(&t->pending, &t->blocked) ||
-	    PENDING(&t->signal->shared_pending, &t->blocked) ||
-	    cgroup_task_frozen(t)) {
+	    has_pending_signal(t) || cgroup_task_frozen(t)) {
 		set_tsk_thread_flag(t, TIF_SIGPENDING);
 		return true;
 	}

-- 
Jens Axboe




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux