Am 26.08.22 um 10:21 schrieb Fiona Ebner: > Am 11.07.22 um 15:40 schrieb Fabian Ebner: >> Am 09.07.22 um 05:39 schrieb Shyam Prasad N: >>> On Sat, Jul 9, 2022 at 9:00 AM Shyam Prasad N <nspmangalore@xxxxxxxxx> wrote: >>>> >>>> On Fri, Jul 8, 2022 at 11:22 PM Enzo Matsumiya <ematsumiya@xxxxxxx> wrote: >>>>> >>>>> On 07/08, Fabian Ebner wrote: >>>>>> (Re-sending without the log from the older kernel, because the mail hit >>>>>> the 100000 char limit with that) >>>>>> >>>>>> Hi, >>>>>> it seems that in kernels >= 5.15, io_uring and CIFS don't interact >>>>>> nicely sometimes, leading to IO errors. Unfortunately, my reproducer is >>>>>> a QEMU VM with a disk on CIFS (original report by one of our users [0]), >>>>>> but I can try to cook up something simpler if you want. >>>>>> >>>>>> Bisecting got me to 8ef12efe26c8 ("io_uring: run regular file >>>>>> completions from task_work") being the first bad commit. >>>>>> > > I finally got around to taking another look at this issue (still present > in 5.19.3) and I think I've finally figured out the root cause: > > After commit 8ef12efe26c8, for my reproducer, the write completion is > added to task_work with notify_method being TWA_SIGNAL and thus > TIF_NOTIFY_SIGNAL is set for the task. > > After that, if we end up in sk_stream_wait_memory() via sock_sendmsg(), > signal_pending(current) will evaluate to true and thus -EINTR is > returned all the way up to sock_sendmsg() in smb_send_kvec(). > > Related: in __smb_send_rqst() there too is a signal_pending(current) > check leading to the -ERESTARTSYS return value. > > To verify that this is the cause, I wasn't able to trigger the issue > anymore with this hack applied (i.e. excluding the TIF_NOTIFY_SIGNAL check): > >> diff --git a/net/core/stream.c b/net/core/stream.c >> index 06b36c730ce8..58e3825930bb 100644 >> --- a/net/core/stream.c >> +++ b/net/core/stream.c >> @@ -134,7 +134,7 @@ int sk_stream_wait_memory(struct sock *sk, long *timeo_p) >> goto do_error; >> if (!*timeo_p) >> goto do_eagain; >> - if (signal_pending(current)) >> + if (task_sigpending(current)) >> goto do_interrupted; >> sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk); >> if (sk_stream_memory_free(sk) && !vm_wait) > > > In __cifs_writev() we have > >> /* >> * If at least one write was successfully sent, then discard any rc >> * value from the later writes. If the other write succeeds, then >> * we'll end up returning whatever was written. If it fails, then >> * we'll get a new rc value from that. >> */ > > so it can happen that collect_uncached_write_data() will (correctly) > report a short write when calling ctx->iocb->ki_complete(). > > But QEMU's io_uring backend treats a short write as an -ENOSPC error, > which also is a bug? Or does the kernel give any guarantees in that > direction? > > Still, it doesn't seem ideal that the "interrupt" happens and in fact > __smb_send_rqst() tries to avoid it, but fails to do so, because of the > unexpected TIF_NOTIFY_SIGNAL: >> /* >> * We should not allow signals to interrupt the network send because >> * any partial send will cause session reconnects thus increasing >> * latency of system calls and overload a server with unnecessary >> * requests. >> */ >> >> sigfillset(&mask); >> sigprocmask(SIG_BLOCK, &mask, &oldmask); > > Do you have any suggestions for how to proceed? > Ping. The issue is still present in Linux 6.0. Does it make sense to also temporarily unset the task's TIF_NOTIFY_SIGNAL here or is that a bad idea? Best Regards, Fiona