Frequent "stale nfs file handle" error

fcannini at gmail.com (Fabricio Cannini) · Tue, 11 Jan 2011 12:01:41 -0200

Hi all.

I've been having this error very frequently, at least once in a week.
Whenever this happens, restarting all the gluster daemons makes things work 
again.

This is the hardware i'm using:

22 nodes
2x Intel xeon 5420 2.5GHz , 16GB ddr2 ECC , 1 sata2 hd of 750GB.
Of which ~600GB is a partition ( /glstfs ) dedicated to gluster. Each node 
have 1 Mellanox MT25204 [InfiniHost III Lx] Inifiniband DDR HCA used by 
gluster through the 'verbs' interface. The switch is a Voltaire ISR 9024S/D.
Each node also is a client of the gluster volume, that is accessed through the 
'/scratch' mount-point.
The machine itself is a scientific cluster, with all nodes and the head running 
Debian Squeeze amd64, with stock 3.0.5 packages.

These are the server and client configs:

Client config
http://pastebin.com/6d4BjQwd

Server config
http://pastebin.com/4ZmX9ir1

And here are some of the messages in the head node log:
http://pastebin.com/gkf3CmK9

If anybody can make a sense of why is it happening, i'd be really really 
thankful.