Re: GFS + DRBD Problems

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 3 Mar 2008, gordan@xxxxxxxxxx wrote:

I have a 2-node cluster with Open Shared Root on GFS on DRBD. A single node mounts GFS OK and works, but after a while seems to just block for disk.

[...]

This usually happens after a period of idleness. If the node is used, this doesn't seem to happen, but leaving it alone for half an hour causes it to block for disk I/O.

I've done a bit more digging, and the processes that hang seem to do so, as expected, in disk sleep state.

For example, when trying to log in, sshd hangs. It's status (from /proc) is:

Name:   sshd
State:  D (disk sleep)
SleepAVG:       97%
[...]

The only open file handles it has are:
# ls -la /proc/9643/fd/
total 0
dr-x------ 2 root root  0 Mar  3 16:41 .
dr-xr-xr-x 5 root root  0 Mar  3 16:41 ..
lrwx------ 1 root root 64 Mar  3 16:42 0 -> /dev/null
lrwx------ 1 root root 64 Mar  3 16:42 1 -> /dev/null
lrwx------ 1 root root 64 Mar  3 16:42 2 -> /dev/null
lrwx------ 1 root root 64 Mar  3 16:42 3 -> socket:[118904]
lrwx------ 1 root root 64 Mar  3 16:42 4 -> /cdsl.local/var/run/utmp

I am guessing that it's the utmp that is blocking things, but I'm not sure. I can read-write the /var/run/utmp file just fine (/var/run is symlinked to /cdsl.local/var/run).

The socked is a TCP socket, so I cannot see that being a disk block issue.

As for /dev/null, I didn't think that could be flock-ed...

Looking at cman_tool status and /proc/drbd, both seem to be in order and saying everything is working.

Any ideas as to what could be causing these bogus disk-sleep lock-ups?

Gordan

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux