Shyam, Could the two test failures in: http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/3/builds/386 be related to this patch ie tests generic/133 and generic/471 failing with hundreds of "CIFS: Status code returned 0xc000009a STATUS_INSUFFICIENT_RESOURCES" errors. The only failing tests were multichannel related. This test run was with these six patches: 600ed21fe802 (HEAD -> for-next, origin/for-next, origin/HEAD) netfs: Fix a number of read-retry hangs 9f75ff5536b1 smb: client, common: Avoid multiple -Wflex-array-member-not-at-end warnings fab0eddb9fe7 cifs: Treat unhandled directory name surrogate reparse points as mount directory nodes 69476da76b9c cifs: Throw -EOPNOTSUPP error on unsupported reparse point type from parse_reparse_point() ef590eae88cf cifs: deal with the channel loading lag while picking channels f1bf10d7e909 cifs: pick channels for individual subrequests Anyone else seeing the same errors with multichannel on these tests? be related On Wed, Feb 12, 2025 at 2:35 PM Steve French <smfrench@xxxxxxxxx> wrote: > > tentatively merged into cifs-2.6.git for-next pending more reviews and testing > > On Wed, Feb 12, 2025 at 1:35 AM <nspmangalore@xxxxxxxxx> wrote: > > > > From: Shyam Prasad N <sprasad@xxxxxxxxxxxxx> > > > > Our current approach to select a channel for sending requests is this: > > 1. iterate all channels to find the min and max queue depth > > 2. if min and max are not the same, pick the channel with min depth > > 3. if min and max are same, round robin, as all channels are equally loaded > > > > The problem with this approach is that there's a lag between selecting > > a channel and sending the request (that increases the queue depth on the channel). > > While these numbers will eventually catch up, there could be a skew in the > > channel usage, depending on the application's I/O parallelism and the server's > > speed of handling requests. > > > > With sufficient parallelism, this lag can artificially increase the queue depth, > > thereby impacting the performance negatively. > > > > This change will change the step 1 above to start the iteration from the last > > selected channel. This is to reduce the skew in channel usage even in the presence > > of this lag. > > > > Fixes: ea90708d3cf3 ("cifs: use the least loaded channel for sending requests") > > Cc: <stable@xxxxxxxxxxxxxxx> > > Signed-off-by: Shyam Prasad N <sprasad@xxxxxxxxxxxxx> > > --- > > fs/smb/client/transport.c | 14 +++++++------- > > 1 file changed, 7 insertions(+), 7 deletions(-) > > > > diff --git a/fs/smb/client/transport.c b/fs/smb/client/transport.c > > index 0dc80959ce48..e2fbf8b18eb2 100644 > > --- a/fs/smb/client/transport.c > > +++ b/fs/smb/client/transport.c > > @@ -1015,14 +1015,16 @@ struct TCP_Server_Info *cifs_pick_channel(struct cifs_ses *ses) > > uint index = 0; > > unsigned int min_in_flight = UINT_MAX, max_in_flight = 0; > > struct TCP_Server_Info *server = NULL; > > - int i; > > + int i, start, cur; > > > > if (!ses) > > return NULL; > > > > spin_lock(&ses->chan_lock); > > + start = atomic_inc_return(&ses->chan_seq); > > for (i = 0; i < ses->chan_count; i++) { > > - server = ses->chans[i].server; > > + cur = (start + i) % ses->chan_count; > > + server = ses->chans[cur].server; > > if (!server || server->terminate) > > continue; > > > > @@ -1039,17 +1041,15 @@ struct TCP_Server_Info *cifs_pick_channel(struct cifs_ses *ses) > > */ > > if (server->in_flight < min_in_flight) { > > min_in_flight = server->in_flight; > > - index = i; > > + index = cur; > > } > > if (server->in_flight > max_in_flight) > > max_in_flight = server->in_flight; > > } > > > > /* if all channels are equally loaded, fall back to round-robin */ > > - if (min_in_flight == max_in_flight) { > > - index = (uint)atomic_inc_return(&ses->chan_seq); > > - index %= ses->chan_count; > > - } > > + if (min_in_flight == max_in_flight) > > + index = (uint)start % ses->chan_count; > > > > server = ses->chans[index].server; > > spin_unlock(&ses->chan_lock); > > -- > > 2.43.0 > > > > > -- > Thanks, > > Steve -- Thanks, Steve