Re: nfs lockup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




No, the lock is nothing to do with drbd. In the ganeti cluster some vms use drbd mirrored disks, but others use images on shared folder on nfs. That locks up sometimes. Drbd devices do work well, every network connectivity work well.

Please give me advice, what to check next time. Unfortunately I cannot reproduce the problem.

Could the 9000 MTU setting affect NFS somehow? Does that count that we are using xen, and thus a hypervisor is involved (regarding drbd it does).

Thanks,


Kojedzinszky Richard
Euronet Magyarorszag Informatika Zrt.

On Wed, 21 Oct 2015, Benjamin Coddington wrote:

Date: Wed, 21 Oct 2015 15:05:24 -0400 (EDT)
From: Benjamin Coddington <bcodding@xxxxxxxxxx>
To: krichy@xxxxxxxxxxxx
Cc: linux-nfs@xxxxxxxxxxxxxxx
Subject: Re: nfs lockup

On Wed, 21 Oct 2015, krichy@xxxxxxxxxxxx wrote:

Dear devs,

We have an nfs lockup issue. We run a ganeti cluster consisting of 7 debian
linux nodes and 1 freenas for hosting the vm images. The images are exported
via nfsv3. The problem is that randomly we end in a livelock on one of our
nodes.

That means the nfs share is alive, we can list directories, files, even can
read files (very slow, see later). And even can write to files, but the file
close operation does not return, it gets blocked.

The read is slow in that way that while copying a file from the share to /tmp,
the data arrives very fast to the node, but in /tmp it accumulates slowly.

I've also opened a debian bug report on it, but I think it is not related to
debian (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=801924).

The only way is to reboot machine, with all the vm's running on it getting
interrupted.

I've captured each tasks' stack trace, hopefully it helps someone to find out
the issue.

Meanwhile the other 6 nodes can access the nfs share right, so I think this is
not a networking or server issue. Restarting the nfs server on the server side
still does not have any effect, not recovering. The nfs tcp connection is
established, listing files works again, but writes not.

Some information of the nodes:
# uname -a
Linux host 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1+deb8u4 (2015-09-19)
x86_64 GNU/Linux

They have 1.5G ram allocated to dom0, that should be enough.

I know this information is little information, give me advice what to look for
next time. Unfortunately I dont know how to reproduce it.

Thanks in advance,

Kojedzinszky Richard
Euronet Magyarorszag Informatika Zrt.

I took a look at your debian bug report.. what's up with those drbd procs?
Are you writing to drbd-backed devs, and have you made sure that's not
involved in any way?

Ben

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux