Re: [PATCH 2/2] cifs: deal with the channel loading lag while picking channels

Steve French <smfrench@xxxxxxxxx> · Wed, 12 Feb 2025 18:24:40 -0600

Shyam,
Could the two test failures in:

http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/3/builds/386

be related to this patch ie tests generic/133 and generic/471 failing
with hundreds of "CIFS: Status code returned 0xc000009a
STATUS_INSUFFICIENT_RESOURCES" errors.  The only failing tests were
multichannel related.

This test run was with these six patches:
600ed21fe802 (HEAD -> for-next, origin/for-next, origin/HEAD) netfs:
Fix a number of read-retry hangs
9f75ff5536b1 smb: client, common: Avoid multiple
-Wflex-array-member-not-at-end warnings
fab0eddb9fe7 cifs: Treat unhandled directory name surrogate reparse
points as mount directory nodes
69476da76b9c cifs: Throw -EOPNOTSUPP error on unsupported reparse
point type from parse_reparse_point()
ef590eae88cf cifs: deal with the channel loading lag while picking channels
f1bf10d7e909 cifs: pick channels for individual subrequests

Anyone else seeing the same errors with multichannel on these tests?

be related

On Wed, Feb 12, 2025 at 2:35 PM Steve French <smfrench@xxxxxxxxx> wrote:
>
> tentatively merged into cifs-2.6.git for-next pending more reviews and testing
>
> On Wed, Feb 12, 2025 at 1:35 AM <nspmangalore@xxxxxxxxx> wrote:
> >
> > From: Shyam Prasad N <sprasad@xxxxxxxxxxxxx>
> >
> > Our current approach to select a channel for sending requests is this:
> > 1. iterate all channels to find the min and max queue depth
> > 2. if min and max are not the same, pick the channel with min depth
> > 3. if min and max are same, round robin, as all channels are equally loaded
> >
> > The problem with this approach is that there's a lag between selecting
> > a channel and sending the request (that increases the queue depth on the channel).
> > While these numbers will eventually catch up, there could be a skew in the
> > channel usage, depending on the application's I/O parallelism and the server's
> > speed of handling requests.
> >
> > With sufficient parallelism, this lag can artificially increase the queue depth,
> > thereby impacting the performance negatively.
> >
> > This change will change the step 1 above to start the iteration from the last
> > selected channel. This is to reduce the skew in channel usage even in the presence
> > of this lag.
> >
> > Fixes: ea90708d3cf3 ("cifs: use the least loaded channel for sending requests")
> > Cc: <stable@xxxxxxxxxxxxxxx>
> > Signed-off-by: Shyam Prasad N <sprasad@xxxxxxxxxxxxx>
> > ---
> >  fs/smb/client/transport.c | 14 +++++++-------
> >  1 file changed, 7 insertions(+), 7 deletions(-)
> >
> > diff --git a/fs/smb/client/transport.c b/fs/smb/client/transport.c
> > index 0dc80959ce48..e2fbf8b18eb2 100644
> > --- a/fs/smb/client/transport.c
> > +++ b/fs/smb/client/transport.c
> > @@ -1015,14 +1015,16 @@ struct TCP_Server_Info *cifs_pick_channel(struct cifs_ses *ses)
> >         uint index = 0;
> >         unsigned int min_in_flight = UINT_MAX, max_in_flight = 0;
> >         struct TCP_Server_Info *server = NULL;
> > -       int i;
> > +       int i, start, cur;
> >
> >         if (!ses)
> >                 return NULL;
> >
> >         spin_lock(&ses->chan_lock);
> > +       start = atomic_inc_return(&ses->chan_seq);
> >         for (i = 0; i < ses->chan_count; i++) {
> > -               server = ses->chans[i].server;
> > +               cur = (start + i) % ses->chan_count;
> > +               server = ses->chans[cur].server;
> >                 if (!server || server->terminate)
> >                         continue;
> >
> > @@ -1039,17 +1041,15 @@ struct TCP_Server_Info *cifs_pick_channel(struct cifs_ses *ses)
> >                  */
> >                 if (server->in_flight < min_in_flight) {
> >                         min_in_flight = server->in_flight;
> > -                       index = i;
> > +                       index = cur;
> >                 }
> >                 if (server->in_flight > max_in_flight)
> >                         max_in_flight = server->in_flight;
> >         }
> >
> >         /* if all channels are equally loaded, fall back to round-robin */
> > -       if (min_in_flight == max_in_flight) {
> > -               index = (uint)atomic_inc_return(&ses->chan_seq);
> > -               index %= ses->chan_count;
> > -       }
> > +       if (min_in_flight == max_in_flight)
> > +               index = (uint)start % ses->chan_count;
> >
> >         server = ses->chans[index].server;
> >         spin_unlock(&ses->chan_lock);
> > --
> > 2.43.0
> >
>
>
> --
> Thanks,
>
> Steve

-- 
Thanks,

Steve