On Thu, 2019-12-12 at 13:13 -0500, Olga Kornievskaia wrote: > On Thu, Dec 12, 2019 at 11:47 AM Trond Myklebust > <trondmy@xxxxxxxxxxxxxxx> wrote: > > Hi Olga, > > > > On Wed, 2019-12-11 at 15:36 -0500, Olga Kornievskaia wrote: > > > Hi Trond, > > > > > > I'd like to raise this once again. Is this true that setting a > > > timeout > > > limit (TCP_USER_TIMEOUT) is not user configurable (rather I'm > > > pretty > > > sure it is not) but my question is why shouldn't it be tied to > > > the > > > "timeo" mount option? Right now, only the sesson/lease manager > > > thread > > > sets it via rpc_set_connect_timeout() to be lease period related. > > > > > > Is it the fact that we don't want to allow user to control TCP > > > settings via the mount options? But somehow folks are expecting > > > to be > > > able to set low "timeo" value and have the (dead) connection to > > > be > > > considered dead earlier than for a rather long timeout period > > > which > > > is > > > happening now. > > > > In my mind, the two are correlated, but are not equivalent. > > > > The 'timeo' value is basically a timeout for how long it takes for > > the > > whole process of "send RPC call", "have it processed by the server" > > and > > "receive reply". > > IOW: 'timeo' is about how long it takes for an RPC call to execute > > end- > > to-end. > > Ok, but what happens is there are no actions (connection wise) are > taken when this timeout goes off and that' a problem for detecting > bad > connections. I'm not sure I understand what you mean. The point of TCP_USER_TIMEOUT is that the TCP layer is told when to time out and break the connection. Furthermore, the other side (i.e. the server) is told about the existence of this timeout, and hence knows what to expect. IOW: there are no actions at the RPC layer because this is a TCP layer thing. > > > The TCP_USER_TIMEOUT, is essentially a timeout for how long it > > takes > > the server to ACK receipt of the RPC call once we've placed it in > > the > > TCP socket. > > IOW: it is a timeout for the networking part of an RPC call > > transmission. > > But why isn't TCP time out (1) not user configurable and/or (2) not > tied to the "timeo" ? > > > So, as I said, the two are correlated: if the server is down, then > > your > > timeout is dominated by the fact that the network transmission > > never > > completes. However if the server is up and congested, then the > > "processing by the server" is likely to dominate. > > > > The other thing to note is that if the TCP connection is > > unresponsive, > > we may want to fail that much faster in order to give ourselves a > > chance to close the connection, open a new one and retransmit the > > requests from the old connection before the 'timeo' is triggered > > (since > > in the case of a soft timeout, that could be a fatal error). > > "we may want to fail" doesn't happen and that's exactly what I would > like to happen. Also, TCP timeout is set to the a lease time (let's > take linux server which sets 90s timeout) and that's larger than the > default "timeo" which is 60s. That goes against your intention to > recover in time. > > > Does that make sense? > > It's the last case I'm interested in. The issue I'm having is that > after a "timeout" (which should be a lease period), the client > doesn't > sent a SYN trying to establish a new connection. TCP_USER_TIMEOUT should not affect the handshake part of the TCP connection (see 'man 7 tcp'). It can't solve a problem with the SYN states. > - > Here's a current problem. In the cloud environment, a server node > goes > down. It's spun up again in a different VM (but with the same IP) and > server is ready to be receiving requests and continue with the IO. > The > problem is the client doesn't try to send a new SYN until the old > connection timeout. This timeout is 3mins for v3 and can't be shorted > because TCP_USER_TIMEOUT isn't user configurable or tied into the > timeo. But user expects that connections times out after 60s (as > default timeo) (or whatever value timeo is specified during mount). > Current linux client doesn't do that. > > Even in v4, in my testing ,the client doesn't send the new SYN after > the lease period (but I believe that's a bug). The only time it does > do it if I change rpc_set_connect_time() to something low so that > default of 18000 is set. > > (1) I could be wrong but I think there is a bug that doesn't > re-establish connection (unless some low value is set). > (2) I think there should be ability (at least for v3) to set the > timeout for lower than 3mins. Perhaps we can add a new mount option, > either have a totally separate tcp timeout value or something like > "sync_nfstcp_timeouts" and use timeo to govern both NFS and TCP > timeout. This needs to be resolved using something different. I'm not sure what to use for timing the handshake out more quickly. > > > > Thanks. > > > > > > On Wed, Oct 3, 2018 at 3:06 PM Olga Kornievskaia <aglo@xxxxxxxxx> > > > wrote: > > > > On Wed, Oct 3, 2018 at 2:45 PM Trond Myklebust < > > > > trondmy@xxxxxxxxxxxxxxx> wrote: > > > > > On Wed, 2018-10-03 at 14:31 -0400, Olga Kornievskaia wrote: > > > > > > Hi folks, > > > > > > > > > > > > Is it true that NFS mount option "timeo" has nothing to do > > > > > > with > > > > > > the > > > > > > socket's setting of the user-specified timeout > > > > > > TCP_USER_TIMEOUT. > > > > > > Instead, when creating a TCP socket NFS uses either > > > > > > default/hard > > > > > > coded > > > > > > value of 60s for v3 or for v4.x it's lease based. Is there > > > > > > no > > > > > > value > > > > > > is > > > > > > having an adjustable TCP timeout value? > > > > > > > > > > > > > > > > It is adjusted. Please see the calculation in > > > > > xs_tcp_set_socket_timeouts(). > > > > > > > > but it's not user configurable, is it? I don't see a way to > > > > modify > > > > v3's default 60s TCP timeout. and also in v4, the timeouts are > > > > set > > > > from xs_tcp_set_connect_timeout() for the lease period but > > > > again > > > > not > > > > user configurable, as far as i can tell. > > > > > > > > > -- > > > > > Trond Myklebust > > > > > Linux NFS client maintainer, Hammerspace > > > > > trond.myklebust@xxxxxxxxxxxxxxx > > > > > > > > > > > > -- > > Trond Myklebust > > Linux NFS client maintainer, Hammerspace > > trond.myklebust@xxxxxxxxxxxxxxx > > > > -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx