Re: [PATCH 1/6] CIFS: Introduce credit-based flow control

Pavel Shilovsky <piastry@xxxxxxxxxxx> · Sat, 18 Feb 2012 09:55:43 +0400

2012/2/15 Pavel Shilovsky <piastry@xxxxxxxxxxx>:
> 2012/2/15 Jeff Layton <jlayton@xxxxxxxxx>:
>> On Wed, 15 Feb 2012 18:18:11 +0400
>> Pavel Shilovsky <piastry@xxxxxxxxxxx> wrote:
>>
>>> 2012/2/12 Jeff Layton <jlayton@xxxxxxxxx>:
>>> > -----BEGIN PGP SIGNED MESSAGE-----
>>> > Hash: SHA1
>>> >
>>> > On Thu,  9 Feb 2012 11:41:24 +0300
>>> > Pavel Shilovsky <piastry@xxxxxxxxxxx> wrote:
>>> >
>>> >> and send no more than credits value requests at once. For SMB/CIFS
>>> >> it's trivial: increment this value by receiving any message and
>>> >> decrement by sending one.
>>> >>
>>> >> Signed-off-by: Pavel Shilovsky <piastry@xxxxxxxxxxx>
>>> >
>>> > Yes! Nice work. This is clearly the right approach for merging SMB2
>>> > support into the kernel. Extend the CIFS code to do what we need and
>>> > share as much code as possible between the SMB1 and SMB2 support. By
>>> > doing this piecemeal too we can avoid a wholesale "code dump" into the
>>> > kernel.
>>> >
>>> > Some comments below.
>>> >
>>> >> ---
>>> >>  fs/cifs/cifsglob.h  |    1 +
>>> >>  fs/cifs/cifssmb.c   |    3 +++
>>> >>  fs/cifs/connect.c   |   10 ++++------
>>> >>  fs/cifs/transport.c |   20 ++++++++++++++++----
>>> >>  4 files changed, 24 insertions(+), 10 deletions(-)
>>> >>
>>> >> diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
>>> >> index 76e7d8b..f96752a 100644
>>> >> --- a/fs/cifs/cifsglob.h
>>> >> +++ b/fs/cifs/cifsglob.h
>>> >> @@ -256,6 +256,7 @@ struct TCP_Server_Info {
>>> >>       bool noautotune;                /* do not autotune send buf sizes */
>>> >>       bool tcp_nodelay;
>>> >>       atomic_t inFlight;  /* number of requests on the wire to server */
>>> >> +     atomic_t credits;  /* send no more requests at once */
>>> >
>>> > Currently, it's not clear whether the inFlight counter is protected by
>>> > the GlobalMid_lock or not. Ditto for this new counter. If it is
>>> > protected by it, then there's no need for it to be an atomic value. The
>>> > locking around this needs to be explicitly clear to avoid later bugs
>>> > when we tinker with this code.
>>>
>>> We decrement inFlight value without holding GlobalMid_lock - so it
>>> should be atomic.
>>>
>>
>> But why is it this way? If it's atomic why bother with the locking at
>> all? If you really need the lock then why bother making it atomic?
>>
>> Any time I see inconsistent locking like this, it's usually an indicator
>> that something is wrong. If it's not wrong then it needs to be
>> documented at the very least.
>>
>> My suspicion is that the existing code was slapped together without
>> much thought as to what locks were necessary. Since you're in here now
>> though, this ought to be corrected before you move forward with adding
>> new code.
>
> Ok, it seems like a bug. Locking it with GlobalMid_lock looks better
> to me (the explanation is below).
>
>>
>>> >
>>> > Also, should we just replace the inFlight counter with the credits
>>> > counter? Is there some way to calculate the # of requests in flight given
>>> > the value of "credits" in the SMB2 case?
>>>
>>> I don't think so.
>>>
>>> >
>>> >>       struct mutex srv_mutex;
>>> >>       struct task_struct *tsk;
>>> >>       char server_GUID[16];
>>> >> diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
>>> >> index 8b7794c..fbc4437 100644
>>> >> --- a/fs/cifs/cifssmb.c
>>> >> +++ b/fs/cifs/cifssmb.c
>>> >> @@ -717,6 +717,7 @@ cifs_echo_callback(struct mid_q_entry *mid)
>>> >>
>>> >>       DeleteMidQEntry(mid);
>>> >>       atomic_dec(&server->inFlight);
>>> >> +     atomic_inc(&server->credits);
>>> >>       wake_up(&server->request_q);
>>> >>  }
>>> >>
>>> >> @@ -1670,6 +1671,7 @@ cifs_readv_callback(struct mid_q_entry *mid)
>>> >>       queue_work(system_nrt_wq, &rdata->work);
>>> >>       DeleteMidQEntry(mid);
>>> >>       atomic_dec(&server->inFlight);
>>> >> +     atomic_inc(&server->credits);
>>> >>       wake_up(&server->request_q);
>>> >>  }
>>> >>
>>> >> @@ -2111,6 +2113,7 @@ cifs_writev_callback(struct mid_q_entry *mid)
>>> >>       queue_work(system_nrt_wq, &wdata->work);
>>> >>       DeleteMidQEntry(mid);
>>> >>       atomic_dec(&tcon->ses->server->inFlight);
>>> >> +     atomic_inc(&tcon->ses->server->credits);
>>> >>       wake_up(&tcon->ses->server->request_q);
>>> >>  }
>>> >>
>>> >> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
>>> >> index 602f77c..6b95304 100644
>>> >> --- a/fs/cifs/connect.c
>>> >> +++ b/fs/cifs/connect.c
>>> >> @@ -648,12 +648,8 @@ static void clean_demultiplex_info(struct TCP_Server_Info *server)
>>> >>        * time to as little as two.
>>> >>        */
>>> >>       spin_lock(&GlobalMid_Lock);
>>> >> -     if (atomic_read(&server->inFlight) >= cifs_max_pending)
>>> >> -             atomic_set(&server->inFlight, cifs_max_pending - 1);
>>> >> -     /*
>>> >> -      * We do not want to set the max_pending too low or we could end up
>>> >> -      * with the counter going negative.
>>> >> -      */
>>> >> +     if (atomic_read(&server->credits) == 0)
>>> >> +             atomic_set(&server->credits, 1);
>>> >>       spin_unlock(&GlobalMid_Lock);
>>> >>       /*
>>> >>        * Although there should not be any requests blocked on this queue it
>>> >> @@ -3759,9 +3755,11 @@ int cifs_negotiate_protocol(unsigned int xid, struct cifs_ses *ses)
>>> >>       if (server->maxBuf != 0)
>>> >>               return 0;
>>> >>
>>> >> +     atomic_set(&ses->server->credits, cifs_max_pending);
>>> >>       rc = CIFSSMBNegotiate(xid, ses);
>>> >>       if (rc == -EAGAIN) {
>>> >>               /* retry only once on 1st time connection */
>>> >> +             atomic_set(&ses->server->credits, cifs_max_pending);
>>> >>               rc = CIFSSMBNegotiate(xid, ses);
>>> >>               if (rc == -EAGAIN)
>>> >>                       rc = -EHOSTDOWN;
>>> >> diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
>>> >> index 0cc9584..f1f927f 100644
>>> >> --- a/fs/cifs/transport.c
>>> >> +++ b/fs/cifs/transport.c
>>> >> @@ -260,17 +260,17 @@ static int wait_for_free_request(struct TCP_Server_Info *server,
>>> >>       if (long_op == CIFS_ASYNC_OP) {
>>> >>               /* oplock breaks must not be held up */
>>> >>               atomic_inc(&server->inFlight);
>>> >> +             atomic_dec(&server->credits);
>>> >>               return 0;
>>> >>       }
>>> >>
>>> >
>>> > I haven't gotten to your later patches so I may be reviewing
>>> > prematurely, but I think we need to rethink the code above. Allowing
>>> > the client to exceed the maxmpx is clearly problematic. Rather than
>>> > doing that we ought to attempt to keep 2 credits in reserve at any
>>> > given time -- one for oplock breaks and one for echo requests.
>>> >
>>> > If MaxMpx 2, then we should disable oplocks. If it's 1, then we
>>> > should disable echoes and oplocks. The latter case may take some overhaul
>>> > of how the echo timeout handling works.
>>>
>>> I agree that allowing to exceed the maxmps is wrong, but I think we
>>> can leave it as it is for this patch.
>>>
>>
>> I suppose, but I worry that adding the new complexity of credits
>> without fixing that bug is going to make this really difficult to
>> debug...
>>
>>> >
>>> >>       spin_lock(&GlobalMid_Lock);
>>> >>       while (1) {
>>> >> -             if (atomic_read(&server->inFlight) >= cifs_max_pending) {
>>> >> +             if (atomic_read(&server->credits) == 0) {
>>>
>>> Now I think that (according to echo and oplock problem) we need this to be:
>>>
>>> if (atomic_read(&server->credits) <= 0) {
>>>
>>> Right?
>>>
>>
>> Ugh, I guess. Is this fixed later in this patchset? If not, then I
>> suggest fixing that problem first before you introduce the new
>> credit-based algorithm. Fixing this problem is really my immediate goal
>> with this patchset. If we can do it in a way that also paves the way
>> for the for SMB2 code then that's ideal.
>>
>
> I understand. In SMB2 we have the dynamic MaxMpx value - so, we should
> disable/enable echo/oplock on the fly.
>
> I suggest to do this for SMB2 on every arrival request:
>
> 1) lock GlobalMid_lock
> 2) update credits and reqOnFlight values
> 3) check if outstanding requests exist - goto 5 (can't make a desicion
> about server configuration).
> 4) change server configuration:
>  4.1) if credits >= 3: enable oplocks & echos
>  4.2) if credits == 2: disable oplocks & enable echos
>  4.3) if credits == 1: disable oplocks & disable echos
> (credits should not be 0 here; if so - reconnect)
> 5) unlock GlobalMid_lock
>
> But for CIFS we should do something like step 4 only once after
> getting MaxMpx value.
>
> Thoughts?
>
> --
> Best regards,
> Pavel Shilovsky.

I started to think that this approach doesn't work. We should have at
least echo_retries number of slots for current echo configuration,
because we make a decision to reconnect or not according to
echo_retries number of echo requests without response.

As for oplocks: the client can receive more than one oplock break at
the time - so, need more than one slot for it. But in this case we
can't figure out what number actually use.

For the first implementation, I suggest to keep only echo_retries
reserved slots for echos. Now, I am not sure if we should keep any
number of credits for oplock breaks.

Thoughts?

-- 
Best regards,
Pavel Shilovsky.
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html