Re: [PATCH] cifs: detect dead connections only when echoes are enabled.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Aurelien,

I did some reading in this code path, and it seems like echo_interval
was changed to 0 for NeedNegotiate cases to handle cases of too much
delay between echoes causing the server to be declared as
unresponsive.
This was back when we used 2*echo_interval as the timeout to declare
the server as unresponsive. Later, Ronnie changed this to
3*echo_interval.
So we should not need to change echo_interval to 0 anymore.

Attached patch has the fix.
Please let me know what you think about this fix.

Regards,
Shyam

On Fri, Apr 30, 2021 at 12:23 AM Steve French <smfrench@xxxxxxxxx> wrote:
>
> FYI kicked off buildbot with your patch (and Rohiths)
>
> http://smb3-test-rhel-75.southcentralus.cloudapp.azure.com/#/builders/11/builds/41
>
> On Thu, Apr 29, 2021 at 10:48 AM Shyam Prasad N <nspmangalore@xxxxxxxxx> wrote:
> >
> > Attaching updated patch with can_echo() instead of referencing echoes directly.
> > I'll submit another patch for the other changes.
> >
> > Regards,
> > Shyam
> >
> > On Thu, Apr 29, 2021 at 5:59 PM Shyam Prasad N <nspmangalore@xxxxxxxxx> wrote:
> > >
> > > Hi Aurélien,
> > >
> > > Replies below...
> > >
> > > On Thu, Apr 29, 2021 at 5:16 PM Aurélien Aptel <aaptel@xxxxxxxx> wrote:
> > > >
> > > >
> > > > Hi Shyam,
> > > >
> > > > Ok so I ended up looking at a lot of code to check this... And I'm still
> > > > unsure.
> > > >
> > > > First, it looks like server->echoes is only used for smb2 and there is
> > > > a generic server->ops->can_echo() you can use that just returns
> > > > server->echoes it for smb2.
> > > Agree. I can use can_echo() instead.
> > >
> > > >
> > > > Shyam Prasad N <nspmangalore@xxxxxxxxx> writes:
> > > > > Hi,
> > > > >
> > > > > Recently, some xfstests and later some manual testing on multi-channel
> > > > > revealed that we detect unresponsive servers on some of the channels
> > > > > to the same server.
> > > > >
> > > > > The issue is seen when a channel is setup and sits idle without any
> > > > > traffic. Generally, we enable echoes and oplocks on a channel during
> > > > > the first request, based on the number of credits available on the
> > > > > channel. So on idle channels, we trip in our logic to check server
> > > > > unresponsiveness.
> > > >
> > > > That makes sense but while looking at the code I see we always queue
> > > > echo request in cifs_get_tcp_session(), which is called when adding a
> > > > channel.
> > > >
> > > > cifs_ses_add_channel() {
> > > >         ctx.echo_interval = ses->server->echo_interval / HZ;
> > > >
> > > >         chan->server = cifs_get_tcp_session(&ctx);
> > > >
> > > >         rc = cifs_negotiate_protocol(xid, ses) {
> > > >                 server->tcpStatus = CifsGood;
> > > >         }
> > > >
> > > >         rc = cifs_setup_session(xid, ses, cifs_sb->local_nls);
> > > > }
> > > >
> > > > cifs_get_tcp_session() {
> > > >
> > > >         INIT_DELAYED_WORK(&tcp_ses->echo, cifs_echo_request);
> > > >
> > > >         tcp_ses->tcpStatus = CifsNeedNegotiate;
> > > >
> > > >         tcp_ses->lstrp = jiffies;
> > > >
> > > >         if (ctx->echo_interval >= SMB_ECHO_INTERVAL_MIN &&
> > > >                 ctx->echo_interval <= SMB_ECHO_INTERVAL_MAX)
> > > >                 tcp_ses->echo_interval = ctx->echo_interval * HZ;
> > > >         else
> > > >                 tcp_ses->echo_interval = SMB_ECHO_INTERVAL_DEFAULT * HZ;
> > > >
> > > >         queue_delayed_work(cifsiod_wq, &tcp_ses->echo, tcp_ses->echo_interval);
> > > > }
> > > >
> > > > cifs_echo_request() {
> > > >
> > > >         if (server->tcpStatus == CifsNeedNegotiate)
> > > >                 echo_interval = 0; <=== branch taken
> > > >         else
> > > >                 echo_interval = server->echo_interval;
> > > >
> > > >         if (server->tcpStatus not in {NeedReconnect, Exiting, New}
> > > >            && server->can_echo()
> > > >            && jiffies > server->lstrp + echo_interval - HZ)
> > > >         {
> > > >                 server->echo();
> > > >         }
> > > >
> > > >         queue_delayed_work(cifsiod_wq, &server->echo, server->echo_interval);
> > > >         ===> echo_interval = 0 so calls itself immediatly
> > > > }
> > > >
> > > > SMB2_echo() {
> > > >         if (server->tcpStatus == CifsNeedNegotiate) {
> > > >                 /* No need to send echo on newly established connections */
> > > >                 mod_delayed_work(cifsiod_wq, &server->reconnect, 0);
> > > >                 ====> calls smb2_reconnect_server() immediatly if NeedNego
> > > >                 return rc;
> > > >         }
> > > >
> > > > }
> > > >
> > > > smb2_reconnect_server() {
> > > >         // channel has no TCONs so it does essentially nothing
> > > > }
> > > >
> > > > server_unresponsive() {
> > > >         if (server->tcpStatus in {Good, NeedNegotiate}
> > > >            && jiffies > server->lstrp + 3 * server->echo_interval)
> > > >         {
> > > >                 cifs_server_dbg(VFS, "has not responded in %lu seconds. Reconnecting...\n",
> > > >                          (3 * server->echo_interval) / HZ);
> > > >                 cifs_reconnect(server);
> > > >                 return true;
> > > >         }
> > > >         return false;
> > > > }
> > > >
> > > > So it looks to me that cifs_get_tcp_session() starts the
> > > > cifs_echo_request() work, which calls itself with no delay in an
> > > > infinite loop doing nothing (that's probably bad...) until session_setup
> > > > succeeds, after which the delay between the self-call is set.
> > >
> > > Perhaps we think that 1 sec is too much for us to complete negotiate?
> > > But echo_interval of 0 looks bad.
> > >
> > > Ideally, we should queue echo work only when session setup is complete.
> > > What do you think?
> > >
> > > >
> > > > During session_setup():
> > > >
> > > > * last response time (lstrp) gets updated
> > > >
> > > > * sending/recv requests should interact with the server
> > > >   credits and call change_conf(), which should enable server->echoes
> > > >   ==> is that part not working?
> > > It's working. But since these channels are idle, change_conf is not
> > > called for these channels.
> > > So echoes=false till there's some activity on the channel.
> > >
> > > >
> > > > Once enabled, the echo_request workq will finally send the echo on the
> > > > wire, which should -/+ 1 credit and update lstrp.
> > > >
> > > > > Attached a one-line fix for this. Have tested it in my environment.
> > > > > Another approach to fix this could be to enable echoes during
> > > > > initialization of a server struct. Or soon after the session setup.
> > > > > But I felt that this approach is better. Let me know if you feel
> > > > > otherwise.
> > > >
> > > > I think the idea of your change is ok but there's probably also an issue
> > > > in crediting in session_setup()/change_conf() if echoes is not enabled
> > > > at this point no?
> > > >
> > > > Cheers,
> > > > --
> > > > Aurélien Aptel / SUSE Labs Samba Team
> > > > GPG: 1839 CB5F 9F5B FB9B AA97  8C99 03C8 A49B 521B D5D3
> > > > SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, DE
> > > > GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 247165 (AG München)
> > > >
> > >
> > >
> > > --
> > > Regards,
> > > Shyam
> >
> >
> >
> > --
> > Regards,
> > Shyam
>
>
>
> --
> Thanks,
>
> Steve



-- 
Regards,
Shyam

Attachment: 0001-cifs-use-echo_interval-even-when-connection-not-read.patch
Description: Binary data


[Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux