Further investigation suggests that locking may have something to do
with this.
On the system that currently runs the services, I find these lock files
in four
-rwx------ 1 root root 0 Oct 8 03:30 lock.0
-rwx------ 1 root root 0 Oct 8 03:30 lock.1
-rwx------ 1 root root 0 Oct 8 03:30 lock.116
-rwx------ 1 root root 0 Oct 8 03:30 lock.2
-rw-r--r-- 1 root root 0 Oct 8 03:31 service.0
-rw-r--r-- 1 root root 0 Oct 10 16:08 service.1
-rw-r--r-- 1 root root 0 Oct 8 03:30 service.2
On the now idel cluster member, I have these lock files:
-rwx------ 1 root root 0 Oct 8 03:30 lock.0
-rwx------ 1 root root 0 Oct 8 03:30 lock.1
-rwx------ 1 root root 0 Oct 8 03:30 lock.116
-rwx------ 1 root root 0 Oct 8 03:30 lock.2
The four lock.n files strike me as odd since I only have three services.
Also, should the lock files even be there on the idle cluster member?
Could anyone running a similar cluster please post the content of the
/var/lock/clumanager/ of the different members along with the the number
of services currently running on that member?
Kind regards,
Herta
Herta Van den Eynde wrote:
environment:
- Red Hat AS 3 (kernel-smp-2.4.21-37.EL - custom built to probe all LUNs
on each SCSI device)
- clumanager 1.2.28
The cluster consists of 2 members running three services which simply
nfs export a number of directories to five other systems.
The cluster has been operational since February.
Following the latest upgrade (from kernel-smp-2.4.21-32.0.1.EL custom
built and clumanager-1.2.26.1-1), all services are running on one
member. When I try to locate the services, the operation fails, and the
following message pops up:
A Problem has occurred while changing ownership
of this service. Please check logs for details.
The cluster log reports the following:
==== begin log extract
Member arnebd trying to relocate lepustl to nihald...Oct 10 16:08:06
arnebd clusvcmgrd: [13627]: <notice> service notice: Stopping service
lepustl ...
Oct 10 16:08:06 arnebd clurmtabd[26429]: <debug> Signal 15 received;
exiting
Oct 10 16:08:12 arnebd clusvcmgrd: [13627]: <err> service error: 'umount
/dev/sdb2' failed (/usr/local/lepus-tl), error=1
Oct 10 16:08:12 arnebd clusvcmgrd: [13627]: <err> service error: umount:
/usr/local/lepus-tl: device is busy
Oct 10 16:08:12 arnebd clusvcmgrd: [13627]: <err> service error: umount:
/usr/local/lepus-tl: device is busy
Oct 10 16:08:12 arnebd clusvcmgrd: [13627]: <err> service error: Cannot
stop filesystems for lepustl
Oct 10 16:08:12 arnebd clusvcmgrd[13626]: <notice> Starting stopped
service lepustl
Oct 10 16:08:12 arnebd clusvcmgrd: [14083]: <notice> service notice:
Starting service lepustl ...
Oct 10 16:08:12 arnebd clurmtabd[14194]: <debug> Log level is now 7
Oct 10 16:08:12 arnebd clurmtabd[14194]: <debug> Polling interval is now
4 seconds
failed
Oct 10 16:08:12 arnebd clusvcmgrd: [14083]: <notice> service notice:
Started service lepustl ...
Oct 10 16:08:14 arnebd clurmtabd[6533]: <debug> Detected modified
/var/lib/nfs/rmtab
Oct 10 16:08:14 arnebd clurmtabd[9655]: <debug> Detected modified
/var/lib/nfs/rmtab
==== end log extract
FWIIW, no one was logged in but me, and my current directory was not on
this filesystem.
Neither fuser nor lsof returned any process using the filesystem.
I figured the clurmtabd process may be locking it, so I did verify that
there is only one clurmtab process for that filesystem.
Any ideas/suggestions?
Kind regards,
Herta
--
Herta Van den Eynde -=- Toledo system management
K.U. Leuven - Ludit -=- phone: +32 (0)16 322 166
-=- 50°51'27" N 004°40'39" E
"I wish I were two little cats. Then I could play together."
Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
--
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster