Re: [PATCH] cifs: do not fail __smb_send_rqst if non-fatal signals are pending

Paulo Alcantara <pc@xxxxxx> · Thu, 21 Jan 2021 09:16:46 -0300

Aurélien Aptel <aaptel@xxxxxxxx> writes:

> Pavel Shilovsky <piastryyy@xxxxxxxxx> writes:
>
>> вт, 19 янв. 2021 г. в 22:38, Steve French <smfrench@xxxxxxxxx>:
>>>
>>> The patch won't merge (also has some text corruptions in it).  This
>>> line of code is different due to commit 6988a619f5b79
>>>
>>> 6988a619f5b79 (Paulo Alcantara 2020-11-28 15:57:06 -0300 342)
>>>  cifs_dbg(FYI, "signal pending before send request\n");
>>> 6988a619f5b79 (Paulo Alcantara 2020-11-28 15:57:06 -0300 343)
>>>  return -ERESTARTSYS;
>>>
>>>         if (signal_pending(current)) {
>>>                 cifs_dbg(FYI, "signal pending before send request\n");
>>>                 return -ERESTARTSYS;
>>>         }
>>>
>>> See:
>>>
>>> Author: Paulo Alcantara <pc@xxxxxx>
>>> Date:   Sat Nov 28 15:57:06 2020 -0300
>>>
>>>     cifs: allow syscalls to be restarted in __smb_send_rqst()
>>>
>>>     A customer has reported that several files in their multi-threaded app
>>>     were left with size of 0 because most of the read(2) calls returned
>>>     -EINTR and they assumed no bytes were read.  Obviously, they could
>>>     have fixed it by simply retrying on -EINTR.
>>>
>>>     We noticed that most of the -EINTR on read(2) were due to real-time
>>>     signals sent by glibc to process wide credential changes (SIGRT_1),
>>>     and its signal handler had been established with SA_RESTART, in which
>>>     case those calls could have been automatically restarted by the
>>>     kernel.
>>>
>>>     Let the kernel decide to whether or not restart the syscalls when
>>>     there is a signal pending in __smb_send_rqst() by returning
>>>     -ERESTARTSYS.  If it can't, it will return -EINTR anyway.
>>>
>>>     Signed-off-by: Paulo Alcantara (SUSE) <pc@xxxxxx>
>>>     CC: Stable <stable@xxxxxxxxxxxxxxx>
>>>     Reviewed-by: Ronnie Sahlberg <lsahlber@xxxxxxxxxx>
>>>     Reviewed-by: Pavel Shilovsky <pshilov@xxxxxxxxxxxxx>
>>>
>>> On Tue, Jan 19, 2021 at 10:32 PM Ronnie Sahlberg <lsahlber@xxxxxxxxxx> wrote:
>>> >
>>> > RHBZ 1848178
>>> >
>>> > There is no need to fail this function if non-fatal signals are
>>> > pending when we enter it.
>>> >
>>> > Signed-off-by: Ronnie Sahlberg <lsahlber@xxxxxxxxxx>
>>> > ---
>>> >  fs/cifs/transport.c | 2 +-
>>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>>> >
>>> > diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
>>> > index c42bda5a5008..98752f7d2cd2 100644
>>> > --- a/fs/cifs/transport.c
>>> > +++ b/fs/cifs/transport.c
>>> > @@ -339,7 +339,7 @@ __smb_send_rqst(struct TCP_Server_Info *server, int num_rqst,
>>> >         if (ssocket == NULL)
>>> >                 return -EAGAIN;
>>> >
>>> > -       if (signal_pending(current)) {
>>> > +       if (fatal_signal_pending(current)) {
>>> >                 cifs_dbg(FYI, "signal is pending before sending any data\n");
>>> >                 return -EINTR;
>>> >         }
>
> I've looked up the difference
>
> static inline int __fatal_signal_pending(struct task_struct *p)
> {
> 	return unlikely(sigismember(&p->pending.signal, SIGKILL));
> }
>
>
>> I have been thinking around the same lines. The original intent of
>> failing the function here was to avoid interrupting packet send in the
>> middle of the packet and not breaking an SMB connection.
>> That's also why signals are blocked around smb_send_kvec() calls. I
>> guess most of the time a socket buffer is not full, so, those
>> functions immediately return success without waiting internally and
>> checking for pending signals. With this change the code may break SMB
>
> Ah, interesting.
>
> I looked up the difference between fatal/non-fatal and it seems
> fatal_signal_pending() really only checks for SIGKILL, but I would
> expect ^C (SIGINT) to return quickly as well.
>
> I thought the point of checking for pending signal early was to return
> quickly to userspace and not be stuck in some unkillable state.
>
> After reading your explanation, you're saying the kernel funcs to send
> on socket will check for any signal and err early in any case.
>
> some_syscall() {
>
>     if (pending_fatal_signal)  <===== if we ignore non-fatal here
>         fail_early();
>
>     block_signals();
>     r = kernel_socket_send {
>         if (pending_signal) <==== they will be caught here
>             return error;
>
>         ...
>     }
>     unblock_signals();
>     if (r)
>         fail();
>     ...
> }
>
> So this patch will (potentially) trigger more reconnect (because we
> actually send the packet as a vector in a loop) but I'm not sure I
> understand why it returns less errors to userspace?
>
> Also, shouldn't we move the pending_fatal_signal check *inside* the blocked
> signal section?
>
> In any case I think we should try to test some of those changes given
> how we have 3,4 patches trying to tweak it on top of each other.

I think it would make sense to have something like

diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
index e9abb41aa89b..f7292c14863e 100644
--- a/fs/cifs/transport.c
+++ b/fs/cifs/transport.c
@@ -340,7 +340,7 @@ __smb_send_rqst(struct TCP_Server_Info *server, int num_rqst,
 
 	if (signal_pending(current)) {
 		cifs_dbg(FYI, "signal pending before send request\n");
-		return -ERESTARTSYS;
+		return __fatal_signal_pending(current) ? -EINTR : -ERESTARTSYS;
 	}
 
 	/* cork the socket */

so that we allow signal handlers to be executed before restarting
syscalls when receiving non-fatal signals, otherwise -EINTR.