Dear devs,We have an nfs lockup issue. We run a ganeti cluster consisting of 7 debian linux nodes and 1 freenas for hosting the vm images. The images are exported via nfsv3. The problem is that randomly we end in a livelock on one of our nodes.
That means the nfs share is alive, we can list directories, files, even can read files (very slow, see later). And even can write to files, but the file close operation does not return, it gets blocked.
The read is slow in that way that while copying a file from the share to /tmp, the data arrives very fast to the node, but in /tmp it accumulates slowly.
I've also opened a debian bug report on it, but I think it is not related to debian (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=801924).
The only way is to reboot machine, with all the vm's running on it getting interrupted.
I've captured each tasks' stack trace, hopefully it helps someone to find out the issue.
Meanwhile the other 6 nodes can access the nfs share right, so I think this is not a networking or server issue. Restarting the nfs server on the server side still does not have any effect, not recovering. The nfs tcp connection is established, listing files works again, but writes not.
Some information of the nodes: # uname -aLinux host 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1+deb8u4 (2015-09-19) x86_64 GNU/Linux
They have 1.5G ram allocated to dom0, that should be enough.I know this information is little information, give me advice what to look for next time. Unfortunately I dont know how to reproduce it.
Thanks in advance, Kojedzinszky Richard Euronet Magyarorszag Informatika Zrt.
Attachment:
all.trace.txt.gz
Description: application/gzip