Re: [PATCH 1/2] cifs: change wait_for_free_request to take number of requests to wait for

ronnie sahlberg <ronniesahlberg@xxxxxxxxx> · Wed, 6 Mar 2019 14:36:15 +1000

On Wed, Mar 6, 2019 at 12:15 PM Pavel Shilovsky <piastryyy@xxxxxxxxx> wrote:
>
> On Wed, Feb 27, 2019 at 09:12 PM, ronnie sahlberg
> <ronniesahlberg@xxxxxxxxx> wrote:
> >
> > On Thu, Feb 21, 2019 at 11:39 AM Pavel Shilovsky <piastryyy@xxxxxxxxx> wrote:
> > >
> ...
> > >
> > > Waiting for more than one credit without a timeout is a source of
> > > possibility to be stuck. We already have 'timeout' argument which
> > > contains some part of flags passed to transport layer from PDU. We
> > > might consider renaming 'optype' argument into flags and use 'timeout'
> > > with its true meaning - timeout of the waiting for credits.
> > >
> > > Once implemented, we can define some default period of time which we
> > > can *afford* to wait for credits and then fall back to sequential code
> > > if we timed out.
> > >
> > > I also suggest to add another function wait_for_compound_credits and
> > > make wait_for_free_request calls it - doesn't require changes to all
> > > the places where the latter is being used.
> >
> > Thanks, I am reworking it and will resend shortly.
> > Passing flags down into these functions instead of timeout/optype
> > makes sense and I will do that.
> >
> > I am not sure we need to add a timeout to represent a deadline after
> > we return -EAGAIN or similar.
> > But for sure we need to prevent waiting atomically for >1 credits to
> > block forever.
>
> (adding Jeremy to comment and cc'ing samba-technical)
>
> Thanks for taking a look at this.
>
> If we don't set a timeout on waiting for several credits at once, we
> may end up waiting forever. It is only a matter of what is a
> possibility to be stuck. We can try to guess if the server is going to
> grant us more or not in future by analyzing the number of requests in
> flight or other parameters but we can't say for sure because the spec
> doesn't say anything about minimal number of credits. This means that
> under certain conditions the server may end up giving us a very few
> credits that won't be enough to do compounding.
>
> Those conditions might be different but from the top of the head I can
> give an example of a highly loaded file server with thousands of
> active connections that experiences CPU and/or memory pressure and
> decides to limit the number of credits per connection. That's why I do
> think that the ability to set a timeout on waiting is essential. We
> can start with something between 100ms and 1s (and even making it
> auto-tunable - let's say a couple round trips to the server) and then
> decide what is the best (or even make it configurable).
>
> > This could happen in two scenarios, the first would be if the number
> > of free credits are persistently pegged at <=1 credits
> > due to persistent and massive number of threads causing single credits
> > SMB2 I/O.
> > I.e. we are constantly starved for multi-credit requests.
> >
> > The second would be if we get stuck at <=1 available credits even when
> > the session is idle.
> > This would be a bug and the best is probably to let the reconnect
> > eventually happen and thus force the credits to be in sync again
> > between server and client.
> > So I don't think we need t do anything explicit for that case here.
> >
> > The first case is a genuine case, but I think we can handle that in a
> > different way instead of switching back to a "grab one credit at a
> > time" like we have now.
> > In that scenario we would have a different problem that is  there
> > could be so many concurrent calls for compounded operations requiring
> > >=3 credits each
> > that we eventually consume MAX_CREDITS number of credits across these
> > threads. Each thread having already allocated one of more credits less
> > that it needs for the full compound
> > and the the threads are now all deadlocked each waiting for one or
> > more credits that will never become available.
> > Making the allocation of credits for a compound allocate all of them
> > atomically solves this type of deadlock, but would be prone to the
> > credits starvation issue.
>
> The approach of grabbing one credit in a time was a temporary work
> around that didn't require a lot of code changes and was easier to be
> pushed to stable kernels. There are no doubts that it left a
> possibility to be stuck and we need a proper solution to handle
> compound requests.
>
> > The way I propose to solve the credits starvation problem for
> > compounds is something like this (I haven't coded it yet so I dont
> > know exactly what it will look like)
> > * After the session has been fully established, we should be able to
> > assume we WILL have a whole bunch of credits available.
> > Modulo a hypothetical situation where the sever will just never grant
> > us more than 1 or 2 credits in total. I don't think this is a
> > reasonable situation we need to worry too much about and maybe the
> > right thing to do here is just to block all compound
> > operations until we get a session reconnect where hopefully the server
> > will be more reasonable.
> > * We can define a new MAX_CREDITS_FOR_COMPOUND which would be the
> > largest number of credits that we will ever create a single compound
> > for.
> > I think that would be 4 at this stage.
>
> Note, that we reserve 1 credit for echo request and 1 credit for
> oplock/lease break ack, so those operations won't be stuck waiting for
> credits. If the maximum number of requests in the compounded chain is
> 4, then 5 credits granted by the server will make our client useless
> for certain operations.
>
> > * In the loop in wait_for_free_credits(), If this is a request for a
> > single credit AND iff it is a normal operation, i.e. &CIFS_OP_MASK ==
> > 0
> > then we will NOT grant the single credit but continue looping.
> >
> > I.e. basically reserving the last MAX_CREDITS_FOR_COMPOUND credits
> > before we completely run out to only be available for compounds.
> > That way we do guarantee that compound requests will be making
> > progress and will not be completely starved by non-compound requests..
> >
>
> Trying to reserve MAX_CREDITS_FOR_COMPOUNDING increases latency of
> single-credit requests. In some workloads compounding won't be even
> used (let's say it is pure IO workload), so instead of using all the
> bandwidth available to the client (and the credit bandwidth might be
> pretty narrow for highly loaded servers) we will end up using only a
> part of it.
>
> But it is not only about latency of a single-credit request, it is
> also about latency of a compounded request itself. Instead of waiting
> for all the credits for a long period of time we can start making
> progress by doing sequential operations. If the round trip time is
> small, the penalty of indefinite waiting instead of going sequential
> will be rather substantial. That's why waiting only for some
> reasonable period of time seems like a good option to try: if we have
> already waited for some reasonable time, let's stop waiting, return
> special error code and let the upper layer process the request
> sequentially.
>
> So, I think that compounding is a great optimization and it is
> important to try using it as much as possible across the client code.
> In the same time, the SMB3 client should be capable to handle any
> workload or scenario that doesn't contradict the spec.
>
> Thoughts?

Thanks. I have added a, for now, fixed timeout with a rather large value and
logging a message when we give up waiting for a multi-credit request.

These messages should be good for identifying if/when this happens.

As for switching from multi-credit compound sequences to to
non-compounded serialized code,
I don't think we should do that because we would have to duplicate all
these codepaths. And only to solve
a pathological issue where performance would already be so poor that
the share is basically inoperable IMHO.

This would only be an issue IFF we have a server that for very long
timeperiods pegs us with less than
a handful of credits even on an idle session.
I think it is fair to wait very long, even indefinitely in this case.
What we might do is eventually perform a full
reconnect and hope for a miracle or log something to point we need to
re-balance our servers.

Again, I am not sure if we should switched to single threaded
un-compouded mode in that situation.
I would consider it a pathological case where performance and behavior
would have degraded so
much that the smb share is almost inoperable.

I would consider this similar to behavior where due to packetloss TCP
pegs us permanently at a congestion window of 3kb.
The solution there is not better packet-loss recovery in TCP or tuning
re-transmission timeouts but fix the underlying network issue
and prevent this pathological case from happening in the first place.

Those are my thoughts. This would be a case where we can log what is
wrong but the fix should be to prevent the server from
doing this in the first place.

regards
ronnie sahlberg

>
>
> --
> Best regards,
> Pavel Shilovsky