Re: [PATCH] cifs: do not fail __smb_send_rqst if non-fatal signals are pending

Aurélien Aptel <aaptel@xxxxxxxx> · Thu, 21 Jan 2021 11:11:26 +0100

Pavel Shilovsky <piastryyy@xxxxxxxxx> writes:

> вт, 19 янв. 2021 г. в 22:38, Steve French <smfrench@xxxxxxxxx>:
>>
>> The patch won't merge (also has some text corruptions in it).  This
>> line of code is different due to commit 6988a619f5b79
>>
>> 6988a619f5b79 (Paulo Alcantara 2020-11-28 15:57:06 -0300 342)
>>  cifs_dbg(FYI, "signal pending before send request\n");
>> 6988a619f5b79 (Paulo Alcantara 2020-11-28 15:57:06 -0300 343)
>>  return -ERESTARTSYS;
>>
>>         if (signal_pending(current)) {
>>                 cifs_dbg(FYI, "signal pending before send request\n");
>>                 return -ERESTARTSYS;
>>         }
>>
>> See:
>>
>> Author: Paulo Alcantara <pc@xxxxxx>
>> Date:   Sat Nov 28 15:57:06 2020 -0300
>>
>>     cifs: allow syscalls to be restarted in __smb_send_rqst()
>>
>>     A customer has reported that several files in their multi-threaded app
>>     were left with size of 0 because most of the read(2) calls returned
>>     -EINTR and they assumed no bytes were read.  Obviously, they could
>>     have fixed it by simply retrying on -EINTR.
>>
>>     We noticed that most of the -EINTR on read(2) were due to real-time
>>     signals sent by glibc to process wide credential changes (SIGRT_1),
>>     and its signal handler had been established with SA_RESTART, in which
>>     case those calls could have been automatically restarted by the
>>     kernel.
>>
>>     Let the kernel decide to whether or not restart the syscalls when
>>     there is a signal pending in __smb_send_rqst() by returning
>>     -ERESTARTSYS.  If it can't, it will return -EINTR anyway.
>>
>>     Signed-off-by: Paulo Alcantara (SUSE) <pc@xxxxxx>
>>     CC: Stable <stable@xxxxxxxxxxxxxxx>
>>     Reviewed-by: Ronnie Sahlberg <lsahlber@xxxxxxxxxx>
>>     Reviewed-by: Pavel Shilovsky <pshilov@xxxxxxxxxxxxx>
>>
>> On Tue, Jan 19, 2021 at 10:32 PM Ronnie Sahlberg <lsahlber@xxxxxxxxxx> wrote:
>> >
>> > RHBZ 1848178
>> >
>> > There is no need to fail this function if non-fatal signals are
>> > pending when we enter it.
>> >
>> > Signed-off-by: Ronnie Sahlberg <lsahlber@xxxxxxxxxx>
>> > ---
>> >  fs/cifs/transport.c | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
>> > index c42bda5a5008..98752f7d2cd2 100644
>> > --- a/fs/cifs/transport.c
>> > +++ b/fs/cifs/transport.c
>> > @@ -339,7 +339,7 @@ __smb_send_rqst(struct TCP_Server_Info *server, int num_rqst,
>> >         if (ssocket == NULL)
>> >                 return -EAGAIN;
>> >
>> > -       if (signal_pending(current)) {
>> > +       if (fatal_signal_pending(current)) {
>> >                 cifs_dbg(FYI, "signal is pending before sending any data\n");
>> >                 return -EINTR;
>> >         }

I've looked up the difference

static inline int __fatal_signal_pending(struct task_struct *p)
{
	return unlikely(sigismember(&p->pending.signal, SIGKILL));
}

> I have been thinking around the same lines. The original intent of
> failing the function here was to avoid interrupting packet send in the
> middle of the packet and not breaking an SMB connection.
> That's also why signals are blocked around smb_send_kvec() calls. I
> guess most of the time a socket buffer is not full, so, those
> functions immediately return success without waiting internally and
> checking for pending signals. With this change the code may break SMB

Ah, interesting.

I looked up the difference between fatal/non-fatal and it seems
fatal_signal_pending() really only checks for SIGKILL, but I would
expect ^C (SIGINT) to return quickly as well.

I thought the point of checking for pending signal early was to return
quickly to userspace and not be stuck in some unkillable state.

After reading your explanation, you're saying the kernel funcs to send
on socket will check for any signal and err early in any case.

some_syscall() {

    if (pending_fatal_signal)  <===== if we ignore non-fatal here
        fail_early();

    block_signals();
    r = kernel_socket_send {
        if (pending_signal) <==== they will be caught here
            return error;

        ...
    }
    unblock_signals();
    if (r)
        fail();
    ...
}

So this patch will (potentially) trigger more reconnect (because we
actually send the packet as a vector in a loop) but I'm not sure I
understand why it returns less errors to userspace?

Also, shouldn't we move the pending_fatal_signal check *inside* the blocked
signal section?

In any case I think we should try to test some of those changes given
how we have 3,4 patches trying to tweak it on top of each other.

Cheers,
-- 
Aurélien Aptel / SUSE Labs Samba Team
GPG: 1839 CB5F 9F5B FB9B AA97  8C99 03C8 A49B 521B D5D3
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, DE
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 247165 (AG München)