Re: Problematic interaction of io_uring and CIFS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 04.10.22 um 16:02 schrieb Jens Axboe:
> On 10/4/22 2:59 AM, Fiona Ebner wrote:
>> Am 26.08.22 um 10:21 schrieb Fiona Ebner:
>>> Am 11.07.22 um 15:40 schrieb Fabian Ebner:
>>>> Am 09.07.22 um 05:39 schrieb Shyam Prasad N:
>>>>> On Sat, Jul 9, 2022 at 9:00 AM Shyam Prasad N <nspmangalore@xxxxxxxxx> wrote:
>>>>>>
>>>>>> On Fri, Jul 8, 2022 at 11:22 PM Enzo Matsumiya <ematsumiya@xxxxxxx> wrote:
>>>>>>>
>>>>>>> On 07/08, Fabian Ebner wrote:
>>>>>>>> (Re-sending without the log from the older kernel, because the mail hit
>>>>>>>> the 100000 char limit with that)
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>> it seems that in kernels >= 5.15, io_uring and CIFS don't interact
>>>>>>>> nicely sometimes, leading to IO errors. Unfortunately, my reproducer is
>>>>>>>> a QEMU VM with a disk on CIFS (original report by one of our users [0]),
>>>>>>>> but I can try to cook up something simpler if you want.
>>>>>>>>
>>>>>>>> Bisecting got me to 8ef12efe26c8 ("io_uring: run regular file
>>>>>>>> completions from task_work") being the first bad commit.
>>>>>>>>
>>>
>>> I finally got around to taking another look at this issue (still present
>>> in 5.19.3) and I think I've finally figured out the root cause:
>>>
>>> After commit 8ef12efe26c8, for my reproducer, the write completion is
>>> added to task_work with notify_method being TWA_SIGNAL and thus
>>> TIF_NOTIFY_SIGNAL is set for the task.
>>>
>>> After that, if we end up in sk_stream_wait_memory() via sock_sendmsg(),
>>> signal_pending(current) will evaluate to true and thus -EINTR is
>>> returned all the way up to sock_sendmsg() in smb_send_kvec().
>>>
>>> Related: in __smb_send_rqst() there too is a signal_pending(current)
>>> check leading to the -ERESTARTSYS return value.
>>>
>>> To verify that this is the cause, I wasn't able to trigger the issue
>>> anymore with this hack applied (i.e. excluding the TIF_NOTIFY_SIGNAL check):
>>>
>>>> diff --git a/net/core/stream.c b/net/core/stream.c
>>>> index 06b36c730ce8..58e3825930bb 100644
>>>> --- a/net/core/stream.c
>>>> +++ b/net/core/stream.c
>>>> @@ -134,7 +134,7 @@ int sk_stream_wait_memory(struct sock *sk, long *timeo_p)
>>>>                         goto do_error;
>>>>                 if (!*timeo_p)
>>>>                         goto do_eagain;
>>>> -               if (signal_pending(current))
>>>> +               if (task_sigpending(current))
>>>>                         goto do_interrupted;
>>>>                 sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk);
>>>>                 if (sk_stream_memory_free(sk) && !vm_wait)
>>>
>>>
>>> In __cifs_writev() we have
>>>
>>>>     /*
>>>>      * If at least one write was successfully sent, then discard any rc
>>>>      * value from the later writes. If the other write succeeds, then
>>>>      * we'll end up returning whatever was written. If it fails, then
>>>>      * we'll get a new rc value from that.
>>>>      */
>>>
>>> so it can happen that collect_uncached_write_data() will (correctly)
>>> report a short write when calling ctx->iocb->ki_complete().
>>>
>>> But QEMU's io_uring backend treats a short write as an -ENOSPC error,
>>> which also is a bug? Or does the kernel give any guarantees in that
>>> direction?
>>>
>>> Still, it doesn't seem ideal that the "interrupt" happens and in fact
>>> __smb_send_rqst() tries to avoid it, but fails to do so, because of the
>>> unexpected TIF_NOTIFY_SIGNAL:
>>>>     /*
>>>>      * We should not allow signals to interrupt the network send because
>>>>      * any partial send will cause session reconnects thus increasing
>>>>      * latency of system calls and overload a server with unnecessary
>>>>      * requests.
>>>>      */
>>>>
>>>>     sigfillset(&mask);
>>>>     sigprocmask(SIG_BLOCK, &mask, &oldmask);
>>>
>>> Do you have any suggestions for how to proceed?
>>>
>>
>> Ping. The issue is still present in Linux 6.0. Does it make sense to
>> also temporarily unset the task's TIF_NOTIFY_SIGNAL here or is that a
>> bad idea?
> 
> You could try setting up with ring with IORING_SETUP_COOP_TASKRUN,
> that'll avoid the TIF_NOTIFY_SIGNAL bits.
> 

Thank you for the suggestion. I tried this, but had no luck. AFAICT,
with IORING_SETUP_COOP_TASKRUN, the notify_method will be
TWA_SIGNAL_NO_IPI and when adding the task work, __set_notify_signal()
is called, which still sets the TIF_NOTIFY_SIGNAL for the task?

Even if it worked, I feel like making the "We should not allow signals
to interrupt the network send"-comment valid again would be nicer.

Best Regards,
Fiona




[Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux