Re: [PATCH 1/6] CIFS: Introduce credit-based flow control

Steve French <smfrench@xxxxxxxxx> · Sat, 18 Feb 2012 00:00:22 -0600

I am not convinced that we need to keep enough echo retries ...
probably ok to be able to send one echo if we have 49 other requests
pending, but don't think it makes sense to reserve more - if the first
echo times out, we can probably simply exceed negotiated mpxcount -
the session would be going down anyway.  Why would we have to save
more than one for oplock - if we receive multiple breaks, we simply
process them serially?

On Fri, Feb 17, 2012 at 11:55 PM, Pavel Shilovsky <piastry@xxxxxxxxxxx> wrote:
>
> 2012/2/15 Pavel Shilovsky <piastry@xxxxxxxxxxx>:
> > 2012/2/15 Jeff Layton <jlayton@xxxxxxxxx>:
> >> On Wed, 15 Feb 2012 18:18:11 +0400
> >> Pavel Shilovsky <piastry@xxxxxxxxxxx> wrote:
> >>
> >>> 2012/2/12 Jeff Layton <jlayton@xxxxxxxxx>:
> >>> > -----BEGIN PGP SIGNED MESSAGE-----
> >>> > Hash: SHA1
> >>> >
> >>> > On Thu,  9 Feb 2012 11:41:24 +0300
> >>> > Pavel Shilovsky <piastry@xxxxxxxxxxx> wrote:
> >>> >
> >>> >> and send no more than credits value requests at once. For SMB/CIFS
> >>> >> it's trivial: increment this value by receiving any message and
> >>> >> decrement by sending one.
> >>> >>
> >>> >> Signed-off-by: Pavel Shilovsky <piastry@xxxxxxxxxxx>
> >>> >
> >>> > Yes! Nice work. This is clearly the right approach for merging SMB2
> >>> > support into the kernel. Extend the CIFS code to do what we need and
> >>> > share as much code as possible between the SMB1 and SMB2 support. By
> >>> > doing this piecemeal too we can avoid a wholesale "code dump" into the
> >>> > kernel.
> >>> >
> >>> > Some comments below.
> >>> >
> >>> >> ---
> >>> >>  fs/cifs/cifsglob.h  |    1 +
> >>> >>  fs/cifs/cifssmb.c   |    3 +++
> >>> >>  fs/cifs/connect.c   |   10 ++++------
> >>> >>  fs/cifs/transport.c |   20 ++++++++++++++++----
> >>> >>  4 files changed, 24 insertions(+), 10 deletions(-)
> >>> >>
> >>> >> diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
> >>> >> index 76e7d8b..f96752a 100644
> >>> >> --- a/fs/cifs/cifsglob.h
> >>> >> +++ b/fs/cifs/cifsglob.h
> >>> >> @@ -256,6 +256,7 @@ struct TCP_Server_Info {
> >>> >>       bool noautotune;                /* do not autotune send buf sizes */
> >>> >>       bool tcp_nodelay;
> >>> >>       atomic_t inFlight;  /* number of requests on the wire to server */
> >>> >> +     atomic_t credits;  /* send no more requests at once */
> >>> >
> >>> > Currently, it's not clear whether the inFlight counter is protected by
> >>> > the GlobalMid_lock or not. Ditto for this new counter. If it is
> >>> > protected by it, then there's no need for it to be an atomic value. The
> >>> > locking around this needs to be explicitly clear to avoid later bugs
> >>> > when we tinker with this code.
> >>>
> >>> We decrement inFlight value without holding GlobalMid_lock - so it
> >>> should be atomic.
> >>>
> >>
> >> But why is it this way? If it's atomic why bother with the locking at
> >> all? If you really need the lock then why bother making it atomic?
> >>
> >> Any time I see inconsistent locking like this, it's usually an indicator
> >> that something is wrong. If it's not wrong then it needs to be
> >> documented at the very least.
> >>
> >> My suspicion is that the existing code was slapped together without
> >> much thought as to what locks were necessary. Since you're in here now
> >> though, this ought to be corrected before you move forward with adding
> >> new code.
> >
> > Ok, it seems like a bug. Locking it with GlobalMid_lock looks better
> > to me (the explanation is below).
> >
> >>
> >>> >
> >>> > Also, should we just replace the inFlight counter with the credits
> >>> > counter? Is there some way to calculate the # of requests in flight given
> >>> > the value of "credits" in the SMB2 case?
> >>>
> >>> I don't think so.
> >>>
> >>> >
> >>> >>       struct mutex srv_mutex;
> >>> >>       struct task_struct *tsk;
> >>> >>       char server_GUID[16];
> >>> >> diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
> >>> >> index 8b7794c..fbc4437 100644
> >>> >> --- a/fs/cifs/cifssmb.c
> >>> >> +++ b/fs/cifs/cifssmb.c
> >>> >> @@ -717,6 +717,7 @@ cifs_echo_callback(struct mid_q_entry *mid)
> >>> >>
> >>> >>       DeleteMidQEntry(mid);
> >>> >>       atomic_dec(&server->inFlight);
> >>> >> +     atomic_inc(&server->credits);
> >>> >>       wake_up(&server->request_q);
> >>> >>  }
> >>> >>
> >>> >> @@ -1670,6 +1671,7 @@ cifs_readv_callback(struct mid_q_entry *mid)
> >>> >>       queue_work(system_nrt_wq, &rdata->work);
> >>> >>       DeleteMidQEntry(mid);
> >>> >>       atomic_dec(&server->inFlight);
> >>> >> +     atomic_inc(&server->credits);
> >>> >>       wake_up(&server->request_q);
> >>> >>  }
> >>> >>
> >>> >> @@ -2111,6 +2113,7 @@ cifs_writev_callback(struct mid_q_entry *mid)
> >>> >>       queue_work(system_nrt_wq, &wdata->work);
> >>> >>       DeleteMidQEntry(mid);
> >>> >>       atomic_dec(&tcon->ses->server->inFlight);
> >>> >> +     atomic_inc(&tcon->ses->server->credits);
> >>> >>       wake_up(&tcon->ses->server->request_q);
> >>> >>  }
> >>> >>
> >>> >> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> >>> >> index 602f77c..6b95304 100644
> >>> >> --- a/fs/cifs/connect.c
> >>> >> +++ b/fs/cifs/connect.c
> >>> >> @@ -648,12 +648,8 @@ static void clean_demultiplex_info(struct TCP_Server_Info *server)
> >>> >>        * time to as little as two.
> >>> >>        */
> >>> >>       spin_lock(&GlobalMid_Lock);
> >>> >> -     if (atomic_read(&server->inFlight) >= cifs_max_pending)
> >>> >> -             atomic_set(&server->inFlight, cifs_max_pending - 1);
> >>> >> -     /*
> >>> >> -      * We do not want to set the max_pending too low or we could end up
> >>> >> -      * with the counter going negative.
> >>> >> -      */
> >>> >> +     if (atomic_read(&server->credits) == 0)
> >>> >> +             atomic_set(&server->credits, 1);
> >>> >>       spin_unlock(&GlobalMid_Lock);
> >>> >>       /*
> >>> >>        * Although there should not be any requests blocked on this queue it
> >>> >> @@ -3759,9 +3755,11 @@ int cifs_negotiate_protocol(unsigned int xid, struct cifs_ses *ses)
> >>> >>       if (server->maxBuf != 0)
> >>> >>               return 0;
> >>> >>
> >>> >> +     atomic_set(&ses->server->credits, cifs_max_pending);
> >>> >>       rc = CIFSSMBNegotiate(xid, ses);
> >>> >>       if (rc == -EAGAIN) {
> >>> >>               /* retry only once on 1st time connection */
> >>> >> +             atomic_set(&ses->server->credits, cifs_max_pending);
> >>> >>               rc = CIFSSMBNegotiate(xid, ses);
> >>> >>               if (rc == -EAGAIN)
> >>> >>                       rc = -EHOSTDOWN;
> >>> >> diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
> >>> >> index 0cc9584..f1f927f 100644
> >>> >> --- a/fs/cifs/transport.c
> >>> >> +++ b/fs/cifs/transport.c
> >>> >> @@ -260,17 +260,17 @@ static int wait_for_free_request(struct TCP_Server_Info *server,
> >>> >>       if (long_op == CIFS_ASYNC_OP) {
> >>> >>               /* oplock breaks must not be held up */
> >>> >>               atomic_inc(&server->inFlight);
> >>> >> +             atomic_dec(&server->credits);
> >>> >>               return 0;
> >>> >>       }
> >>> >>
> >>> >
> >>> > I haven't gotten to your later patches so I may be reviewing
> >>> > prematurely, but I think we need to rethink the code above. Allowing
> >>> > the client to exceed the maxmpx is clearly problematic. Rather than
> >>> > doing that we ought to attempt to keep 2 credits in reserve at any
> >>> > given time -- one for oplock breaks and one for echo requests.
> >>> >
> >>> > If MaxMpx 2, then we should disable oplocks. If it's 1, then we
> >>> > should disable echoes and oplocks. The latter case may take some overhaul
> >>> > of how the echo timeout handling works.
> >>>
> >>> I agree that allowing to exceed the maxmps is wrong, but I think we
> >>> can leave it as it is for this patch.
> >>>
> >>
> >> I suppose, but I worry that adding the new complexity of credits
> >> without fixing that bug is going to make this really difficult to
> >> debug...
> >>
> >>> >
> >>> >>       spin_lock(&GlobalMid_Lock);
> >>> >>       while (1) {
> >>> >> -             if (atomic_read(&server->inFlight) >= cifs_max_pending) {
> >>> >> +             if (atomic_read(&server->credits) == 0) {
> >>>
> >>> Now I think that (according to echo and oplock problem) we need this to be:
> >>>
> >>> if (atomic_read(&server->credits) <= 0) {
> >>>
> >>> Right?
> >>>
> >>
> >> Ugh, I guess. Is this fixed later in this patchset? If not, then I
> >> suggest fixing that problem first before you introduce the new
> >> credit-based algorithm. Fixing this problem is really my immediate goal
> >> with this patchset. If we can do it in a way that also paves the way
> >> for the for SMB2 code then that's ideal.
> >>
> >
> > I understand. In SMB2 we have the dynamic MaxMpx value - so, we should
> > disable/enable echo/oplock on the fly.
> >
> > I suggest to do this for SMB2 on every arrival request:
> >
> > 1) lock GlobalMid_lock
> > 2) update credits and reqOnFlight values
> > 3) check if outstanding requests exist - goto 5 (can't make a desicion
> > about server configuration).
> > 4) change server configuration:
> >  4.1) if credits >= 3: enable oplocks & echos
> >  4.2) if credits == 2: disable oplocks & enable echos
> >  4.3) if credits == 1: disable oplocks & disable echos
> > (credits should not be 0 here; if so - reconnect)
> > 5) unlock GlobalMid_lock
> >
> > But for CIFS we should do something like step 4 only once after
> > getting MaxMpx value.
> >
> > Thoughts?
> >
> > --
> > Best regards,
> > Pavel Shilovsky.
>
> I started to think that this approach doesn't work. We should have at
> least echo_retries number of slots for current echo configuration,
> because we make a decision to reconnect or not according to
> echo_retries number of echo requests without response.
>
> As for oplocks: the client can receive more than one oplock break at
> the time - so, need more than one slot for it. But in this case we
> can't figure out what number actually use.
>
> For the first implementation, I suggest to keep only echo_retries
> reserved slots for echos. Now, I am not sure if we should keep any
> number of credits for oplock breaks.
>
> Thoughts?
>
> --
> Best regards,
> Pavel Shilovsky.

--
Thanks,

Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html