On Wed, 24 Sep 2014 16:39:55 +0100 Benjamin ESTRABAUD <be@xxxxxxxxxx> wrote: > Hi! > > I've got a scenario where I'm connected to a NFS share on a client, have > a file descriptor open as read only (could also be write) on a file from > that share, and I'm suddenly changing the IP address of that client. > > Obviously, the NFS share will hang, so if I now try to read the file > descriptor I've got open (here in Python), the "read" call will also hang. > > However, the driver seems to attempt to do something (maybe > save/determine whether the existing connection can be saved) and then, > after about 20 minutes the driver transparently reconnects to the NFS > share (which is what I wanted anyways) and the "read" call instantiated > earlier simply finishes (I don't even have to re-open the file again or > even call "read" again). > > The dmesg prints I get are as follow: > > [ 4424.500380] nfs: server 10.0.2.17 not responding, still trying <-- > changed IP address and started reading the file. > [ 4451.560467] nfs: server 10.0.2.17 OK <--- The NFS share was > reconnected, the "read" call completes successfully. The difference between these timestamps is 27 seconds, which is a lot less than the "20 minutes" that you quote. That seems odd. If you adjust /proc/sys/net/ipv4/tcp_retries2 you can reduce the current timeout. See Documentation/networking/ip-sysctl.txt for details on the setting. https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt It claims the default gives an effective timeout of 924 seconds or about 15 minutes. I just tried and the timeout was 1047 seconds. This is probably the next retry after 924 seconds. If I reduce tcp_retries2 to '3' (well below the recommended minimum) I get a timeout of 5 seconds. You can possibly find a suitable number that isn't too small... Alternately you could use NFSv4. It will close the connection on a timeout. In the default config I measure a 78 second timeout, which is probably more acceptable. This number would respond to the timeo mount option. If I set that to 100, I get a 28 second timeout. The same effect could be provided for NFSv3 by setting: __set_bit(NFS_CS_DISCRTRY, &clp->cl_flags); somewhere appropriate. I wonder why that isn't being done for v3 already... Probably some subtle protocol difference. NeilBrown > I would like to know if there was any way to tune this behaviour, > telling the NFS driver to reconnect if a share is unavailable after say > 10 seconds. > > I tried the following options without any success: > > retry=0; hard/soft; timeo=3; retrans=1; bg/fg > > I am running on a custom distro (homemade embedded distro, not based on > anything in particular) running stock kernel 3.10.18 compiled for i686. > > Would anyone know what I could do to force NFS into reconnecting a > seemingly "dead" session sooner? > > Thanks in advance for your help. > > Regards, > > Ben - MPSTOR. > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html
Attachment:
signature.asc
Description: PGP signature