Hi Shyam, Ok so I ended up looking at a lot of code to check this... And I'm still unsure. First, it looks like server->echoes is only used for smb2 and there is a generic server->ops->can_echo() you can use that just returns server->echoes it for smb2. Shyam Prasad N <nspmangalore@xxxxxxxxx> writes: > Hi, > > Recently, some xfstests and later some manual testing on multi-channel > revealed that we detect unresponsive servers on some of the channels > to the same server. > > The issue is seen when a channel is setup and sits idle without any > traffic. Generally, we enable echoes and oplocks on a channel during > the first request, based on the number of credits available on the > channel. So on idle channels, we trip in our logic to check server > unresponsiveness. That makes sense but while looking at the code I see we always queue echo request in cifs_get_tcp_session(), which is called when adding a channel. cifs_ses_add_channel() { ctx.echo_interval = ses->server->echo_interval / HZ; chan->server = cifs_get_tcp_session(&ctx); rc = cifs_negotiate_protocol(xid, ses) { server->tcpStatus = CifsGood; } rc = cifs_setup_session(xid, ses, cifs_sb->local_nls); } cifs_get_tcp_session() { INIT_DELAYED_WORK(&tcp_ses->echo, cifs_echo_request); tcp_ses->tcpStatus = CifsNeedNegotiate; tcp_ses->lstrp = jiffies; if (ctx->echo_interval >= SMB_ECHO_INTERVAL_MIN && ctx->echo_interval <= SMB_ECHO_INTERVAL_MAX) tcp_ses->echo_interval = ctx->echo_interval * HZ; else tcp_ses->echo_interval = SMB_ECHO_INTERVAL_DEFAULT * HZ; queue_delayed_work(cifsiod_wq, &tcp_ses->echo, tcp_ses->echo_interval); } cifs_echo_request() { if (server->tcpStatus == CifsNeedNegotiate) echo_interval = 0; <=== branch taken else echo_interval = server->echo_interval; if (server->tcpStatus not in {NeedReconnect, Exiting, New} && server->can_echo() && jiffies > server->lstrp + echo_interval - HZ) { server->echo(); } queue_delayed_work(cifsiod_wq, &server->echo, server->echo_interval); ===> echo_interval = 0 so calls itself immediatly } SMB2_echo() { if (server->tcpStatus == CifsNeedNegotiate) { /* No need to send echo on newly established connections */ mod_delayed_work(cifsiod_wq, &server->reconnect, 0); ====> calls smb2_reconnect_server() immediatly if NeedNego return rc; } } smb2_reconnect_server() { // channel has no TCONs so it does essentially nothing } server_unresponsive() { if (server->tcpStatus in {Good, NeedNegotiate} && jiffies > server->lstrp + 3 * server->echo_interval) { cifs_server_dbg(VFS, "has not responded in %lu seconds. Reconnecting...\n", (3 * server->echo_interval) / HZ); cifs_reconnect(server); return true; } return false; } So it looks to me that cifs_get_tcp_session() starts the cifs_echo_request() work, which calls itself with no delay in an infinite loop doing nothing (that's probably bad...) until session_setup succeeds, after which the delay between the self-call is set. During session_setup(): * last response time (lstrp) gets updated * sending/recv requests should interact with the server credits and call change_conf(), which should enable server->echoes ==> is that part not working? Once enabled, the echo_request workq will finally send the echo on the wire, which should -/+ 1 credit and update lstrp. > Attached a one-line fix for this. Have tested it in my environment. > Another approach to fix this could be to enable echoes during > initialization of a server struct. Or soon after the session setup. > But I felt that this approach is better. Let me know if you feel > otherwise. I think the idea of your change is ok but there's probably also an issue in crediting in session_setup()/change_conf() if echoes is not enabled at this point no? Cheers, -- Aurélien Aptel / SUSE Labs Samba Team GPG: 1839 CB5F 9F5B FB9B AA97 8C99 03C8 A49B 521B D5D3 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, DE GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 247165 (AG München)