Hi, On Tue, Mar 14, 2017 at 11:15:25PM +0100, Sebastian Schmidt wrote: > I was debugging some mysterious high CPU usage and tracked it down to > monitoring daemon regularly calling stat*() on an NFS automount > directory. The problem is triggered when mount.nfs passes mount() an > addr= that points to an unreachable address (i.e. connecting fails > immediately). I looked further into the busy-reconnect issue and I want to share what I believe happens. My initial report called mount.nfs with "2001:4860:4860:0:0:0:0:8888:/" which is, as Jeff pointed out, incorrect, but it caused mount(2) to be called with addr=0.0.7.209. In reality, I'm losing my default route and an actually valid addr= is getting passed to mount(), but both cases hit the same code. In xs_tcp_setup_socket(), xs_tcp_finish_connecting() returns an error. For my made-up test case (0.0.7.209) it's EINVAL, in real life ENETUNREACH. The third trigger is passing a valid IPv6 addr= and setting net.ipv6.conf.all.disable_ipv6 to 1, thereby causing an EADDRNOTAVAIL. Interestingly, the EADDRNOTAVAIL branch has this comment: /* We're probably in TIME_WAIT. Get rid of existing socket, * and retry */ xs_tcp_force_close(xprt); break; whereas the EINVAL and ENETUNREACH case carries this one: /* retry with existing socket, after a delay */ xs_tcp_force_close(xprt); goto out; So both calls to xs_tcp_force_close() claim to retry, but one reuses the socket and the other doesn't? The only code skipped by the "goto out" for the second case is "status = -EAGAIN", and this apparently does not cause any delayed retries either. That second case got changed in 4efdd92c921135175a85452cd41273d9e2788db3, where the call to xs_tcp_force_close() was added initially. That call, however, causes an autoclose call via xprt_force_disconnect(), eventually invalidating transport->sock. That transport->sock, however, is being checked in xs_connect() for !NULL and, in that case only, a delayed reconnect is scheduled. If disable_ipv6=1 would already have caused connect() to return EADDRNOTAVAIL, rather than ENETUNREACH as with 3.19-ish, that same busy-reconnect loop would have also been triggered in that case, even before 4efdd92c. So apparently the (only?) code that's responsible for delaying a reconnect is in xs_connect(), and due to the fact that xs_tcp_force_close() is called on every connection error, transport->sock gets NULLed due to autoclose and that delay code is never run. Here I'm stuck at figuring out what the code is intented to do and would appreciate any help. Thanks, Sebastian
Attachment:
signature.asc
Description: PGP signature