Re: Services timeout

Jordi Prats <jprats@xxxxxxxx> · Thu, 13 Sep 2007 09:10:12 +0200

Hi,
This is all the data I can collect. bpkar is a backup process. It have 
happened while it was indexing (it takes several weeks) and doing a 
backup at the same time.

best regards,

Sep 12 06:09:44 inf04 clurgmgrd[5964]: <notice> Stopping service 
padicat.dades
Sep 12 06:09:45 inf04 clurgmgrd: [5964]: <info> Executing 
/etc/init.d/add.nfs.padicat.dades stop
Sep 12 06:09:45 inf04 clurgmgrd: [5964]: <info> Removing IPv4 address 
192.168.12.205 from bond0
Sep 12 06:09:54 inf04 clurgmgrd: [5964]: <info> Executing 
/etc/init.d/add.nfs.recercat status
Sep 12 06:09:55 inf04 clurgmgrd: [5964]: <info> unmounting 
/projectes/padicat/dades
Sep 12 06:09:55 inf04 clurgmgrd: [5964]: <notice> Forcefully unmounting 
/projectes/padicat/dades
Sep 12 06:09:56 inf04 clurgmgrd: [5964]: <warning> killing process 13266 
(root bpbkar /projectes/padicat/dades)
Sep 12 06:09:56 inf04 clurgmgrd: [5964]: <warning> Dropping node-wide 
NFS locks
Sep 12 06:10:04 inf04 clurgmgrd: [5964]: <info> Executing 
/etc/init.d/add.nfs.padicat.web status
Sep 12 06:10:04 inf04 clurgmgrd: [5964]: <info> Executing 
/etc/init.d/add.nfs.local status
Sep 12 06:10:06 inf04 clurgmgrd: [5964]: <info> unmounting 
/projectes/padicat/dades
Sep 12 06:10:06 inf04 clurgmgrd: [5964]: <notice> Forcefully unmounting 
/projectes/padicat/dades
Sep 12 06:10:07 inf04 clurgmgrd: [5964]: <info> Sending reclaim 
notifications via inf04.cesca.es
Sep 12 06:10:07 inf04 rpc.statd[27045]: Version 1.0.6 Starting
Sep 12 06:10:07 inf04 rpc.statd[27045]: Flags: No-Daemon Notify-Only
Sep 12 06:10:10 inf04 rpc.statd[27045]: Caught signal 15, un-registering 
and exiting.
Sep 12 06:10:10 inf04 clurgmgrd: [5964]: <info> Sending reclaim 
notifications via nfstdx
Sep 12 06:10:10 inf04 rpc.statd[27067]: Version 1.0.6 Starting
Sep 12 06:10:10 inf04 rpc.statd[27067]: Flags: No-Daemon Notify-Only
Sep 12 06:10:13 inf04 rpc.statd[27067]: Caught signal 15, un-registering 
and exiting.
Sep 12 06:10:13 inf04 clurgmgrd: [5964]: <info> Sending reclaim 
notifications via nfspadicatweb
Sep 12 06:10:13 inf04 rpc.statd[27089]: Version 1.0.6 Starting
Sep 12 06:10:13 inf04 rpc.statd[27089]: Flags: No-Daemon Notify-Only
Sep 12 06:10:14 inf04 clurgmgrd: [5964]: <info> Executing 
/etc/init.d/add.nfs.tdx status
Sep 12 06:10:16 inf04 rpc.statd[27089]: Caught signal 15, un-registering 
and exiting.
Sep 12 06:10:16 inf04 clurgmgrd: [5964]: <info> Sending reclaim 
notifications via nfslocal
Sep 12 06:10:16 inf04 rpc.statd[27321]: Version 1.0.6 Starting
Sep 12 06:10:16 inf04 rpc.statd[27321]: Flags: No-Daemon Notify-Only
Sep 12 06:10:19 inf04 rpc.statd[27321]: Caught signal 15, un-registering 
and exiting.
Sep 12 06:10:19 inf04 clurgmgrd: [5964]: <info> Sending reclaim 
notifications via nfsrecercat
Sep 12 06:10:19 inf04 rpc.statd[27343]: Version 1.0.6 Starting
Sep 12 06:10:19 inf04 rpc.statd[27343]: Flags: No-Daemon Notify-Only
Sep 12 06:10:22 inf04 rpc.statd[27343]: Caught signal 15, un-registering 
and exiting.
Sep 12 06:10:22 inf04 clurgmgrd: [5964]: <err> 'umount 
/projectes/padicat/dades' failed, error=0
Sep 12 06:10:22 inf04 clurgmgrd[5964]: <notice> stop on fs 
"PADICAT.dades" returned 2 (invalid argument(s))
Sep 12 06:10:22 inf04 clurgmgrd[5964]: <crit> #12: RG padicat.dades 
failed to stop; intervention required
Sep 12 06:10:22 inf04 clurgmgrd[5964]: <notice> Service padicat.dades is 
failed

Lon Hohberger wrote:
On Wed, Sep 12, 2007 at 09:14:04AM +0200, Jordi Prats wrote:

Hi,
I have a NFS server with RedHat Cluster. Sometimes when is on heavy load 
it sets the service status to failed. There's no fs corruption and no 
daemon is down. I suspect this is caused by some timeout while is 
checking the fs is mounted. There is any way to define the check 
interval or the check timeout?

It shouldn't matter about load - a fail only occurs on fail-to-stop
cases.  Do you have any log messages from the incident?

--
......................................................................
        __
       / /          Jordi Prats
 C E / S / C A      Dept. de Sistemes
     /_/            Centre de Supercomputació de Catalunya

 Gran Capità, 2-4 (Edifici Nexus) · 08034 Barcelona
 T. 93 205 6464 · F.  93 205 6979 · jprats@xxxxxxxx
...................................................................... 

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster