NFS timeout w/ large writes

david at davidcoulson.net (David Coulson) · Thu, 17 Jan 2013 09:06:43 -0500

We have a Gluster 3.2.5 environment using NFS mounts, which in general 
is stable. However, we've identified an issue where the NFS server goes 
out to lunch when we do a large (>200mb) write to one of the mounts. 
Unfortunately there is next to nothing in the nfs.log file, other than 
it complaining a brick didn't respond in the timeout interval. The NFS 
server gets to the point where the only way to recover is to reboot the 
box (our gluster nodes mount the volumes using gluster NFS over loopback).

This is the config of the volume which failed this morning - Not sure if 
it is a tuning issue, or a bug. If nothing else, is there a way to 
improve the debugging of the gluster nfs daemon?

[root at dresproddns02 glusterfs]# gluster volume info svn
Type: Replicate
Status: Started
Number of Bricks: 4
Transport-type: tcp
Bricks:
Brick1: rhesproddns01:/gluster/svn
Brick2: rhesproddns02:/gluster/svn
Brick3: dresproddns01:/gluster/svn
Brick4: dresproddns02:/gluster/svn
Options Reconfigured:
nfs.rpc-auth-allow: 127.0.0.1
performance.client-io-threads: 1
performance.flush-behind: on
network.ping-timeout: 5
performance.stat-prefetch: on
nfs.disable: off
nfs.register-with-portmap: on
auth.allow: 10.250.53.*,10.252.248.*,169.254.*,127.0.0.1
performance.cache-size: 256Mb
performance.write-behind-window-size: 128Mb
performance.io-cache: on
performance.io-thread-count: 64
performance.quick-read: on