On Thu, 2017-02-09 at 08:47 +0100, Mkrtchyan, Tigran wrote: > > ----- Original Message ----- > > From: "Trond Myklebust" <trondmy@xxxxxxxxxxxxxxx> > > To: "tigran mkrtchyan" <tigran.mkrtchyan@xxxxxxx> > > Cc: "Anna Schumaker" <Anna.Schumaker@xxxxxxxxxx>, linux-nfs@xxxxxxx > > rnel.org > > Sent: Wednesday, February 8, 2017 10:57:44 PM > > Subject: Re: [PATCH 0/4] Match TCP connection timeouts to the lease > > period > > On Wed, 2017-02-08 at 22:18 +0100, Mkrtchyan, Tigran wrote: > > > > > > > > > On Feb 8, 2017 17:48, Trond Myklebust <trond.myklebust@primarydat > > > a.co > > > m> wrote: > > > With the current default TCP connection timeout being set at > > > around > > > 3 minutes, and most server vendors setting the lease period at > > > values significantly lower than that, we can end up losing the > > > lease > > > while waiting for the TCP layer to discover that we need to break > > > the > > > connection. > > > This patch series sets up an interface to allow the NFSv4 client > > > to > > > adjust thsee timeout values down once it has obtained a value for > > > the lease period from the server. > > > > > > > > > Cool finally! I was waiting for that quite a bit. > > > > > > https://www.ietf.org/mail-archive/web/nfsv4/current/msg13758.html > > > > > > > > > > No. This mechanism does not excuse the server from having to > > process > > requests, and I am certainly not signing us up to any "gentleman's > > agreement". > > > > If the server has received an RPC call, then it MUST process it, > > whether or not there is a TCP connection, otherwise the lease may > > still > > be lost due to the server failing to live up to the requirements in > > RFC5661 Section 8.3 (See https://tools.ietf.org/html/rfc5661#sectio > > n-8. > > 3). Those requirements state that "If the client ID's lease has not > > expired when the server receives a SEQUENCE operation, then the > > server > > MUST renew the lease." > > > > The point of these patches are to ensure that we detect routing > > changes > > and things like that in a timely fashion (again, as required by > > RFC5661, Section 8.3) so that we can re-establish the connection > > and > > renew the lease. > > > > Sure, I am fully agree with you. My point is that client's TCP > timeout > is not a 'random number' any more and have a relation with the lease > time. > However, the same can be applied to RPC timeouts as well. Our server > always to sends ERR_DELAY if request is not processed within 3sec to > avoid retries. This '3sec' is a random number, as we have no idea > when > the client will lose his patience. > Fair enough. Note that the client is required to wait until the RPC call is finished no matter what; it cannot reuse that slot until the server is done. As for the TCP connection timeout, it is still a little unclear to me that the 1 lease period is optimal. You can still end up losing the lease, since we don't start sending SEQUENCE operations until well into the second half of the period following the last lease renewal. My hope is that the KEEPALIVE will save us there, since it monitors the entire lease period. That said, I'm open to arguments that we might want to reduce the timeout further to, say, 3/4 lease period in order to allow for some time to re-establish the TCP connection. The reason why I have not done this is because that means we will end up breaking the connection after 30s against some servers; I'm not sure that even data centre networks will guarantee stability of service at that level. Thoughts and comments? > Tigran. > > > > > Tigran. > > > > > > Trond Myklebust (4): > > > SUNRPC: Remove unused function rpc_get_timeout() > > > SUNRPC: Refactor TCP socket timeout code into a helper function > > > SUNRPC: Allow changing of the TCP timeout parameters on the fly > > > NFSv4: Set the connection timeout to match the lease period > > > > > > fs/nfs/nfs4renewd.c | 2 +- > > > include/linux/sunrpc/clnt.h | 6 +-- > > > include/linux/sunrpc/xprt.h | 4 ++ > > > include/linux/sunrpc/xprtsock.h | 3 ++ > > > net/sunrpc/clnt.c | 51 +++++++++++++----------- > > > net/sunrpc/xprtsock.c | 88 > > > ++++++++++++++++++++++++++++++++--------- > > > 6 files changed, 107 insertions(+), 47 deletions(-) > > > > > > > -- > > > > > > > > > > > > > > Trond Myklebust > > Principal System Architect > > 4300 El Camino Real | Suite 100 > > Los Altos, CA 94022 > > W: 650-422-3800 > > C: 801-921-4583 > > www.primarydata.com > > > > > > > > N�����r��y���b�X��ǧv�^�){.n�+����{���"��^n�r���z���h����&���G���h� > > (�階�ݢj"���m�����z�ޖ���f���h���~�m� > > -- Trond Myklebust Principal System Architect 4300 El Camino Real | Suite 100 Los Altos, CA 94022 W: 650-422-3800 C: 801-921-4583 www.primarydata.com ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥