Hi all we just hit this Problem again: Jun 18 08:03:08 lilr623a clurgmgrd[22152]: #48: Unable to obtain cluster lock: Connection timed out Jun 18 08:03:35 lilr623f clurgmgrd: [21651]: Executing /usr/local/swadmin/caa/SAP/P06WD002 status Jun 18 08:05:29 lilr623f clurgmgrd[21651]: #49: Failed getting status for RG P06WD002 is there any open Bugzilla about this Problem? what we also see that the Crash maybe is realated to the cron.daily entries. Maybe some crontab entry trigger this dlmbug? Here you can see the crontab, the cron.daily start at 08:02 the Cluster stuck ag 08:03 ! Also the last time it was also the same time. root@lilr623a:/tmp# cat /etc/crontab SHELL=/bin/bash PATH=/sbin:/bin:/usr/sbin:/usr/bin MAILTO=root HOME=/ # run-parts 01 * * * * root run-parts /etc/cron.hourly 02 8 * * * root run-parts /etc/cron.daily 22 4 * * 0 root run-parts /etc/cron.weekly 42 4 1 * * root run-parts /etc/cron.monthly root@lilr623a:/tmp# ls -l /etc/cron.daily total 28 lrwxrwxrwx 1 root root 28 Oct 5 2006 00-logwatch -> ../log.d/scripts/logwatch.pl -rwxr-xr-x 1 root root 418 Apr 14 2006 00-makewhatis.cron -rwxr-xr-x 1 root root 276 Sep 28 2004 0anacron -rwxr-xr-x 1 root root 180 Jul 13 2005 logrotate -rwxr-xr-x 1 root root 48 Apr 9 2006 mcelog.cron -rwxr-xr-x 1 root root 2133 Dec 1 2004 prelink -rwxr-xr-x 1 root root 121 Aug 8 2005 slocate.cron Thanks for your help Mike -----Original Message----- From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Lon Hohberger Sent: Freitag, 11. Mai 2007 22:19 To: linux clustering Subject: Re: clurgmgrd - <err> #48: Unable to obtain clusterlock: Connectiontimed out On Mon, May 07, 2007 at 01:54:56PM -0400, rhurst@xxxxxxxxxxxxxxxxx wrote: > What could cause clurgmgrd fail like this? If clurgmgrd has a hiccup > like this, is it supposed to shutdown its services? Is there > something in our implementation that could have prevented this from shutting down? > > For unexplained reasons, we just had our CS service (WATSON) go down > on its own, and the syslog entry details the event as: > > May 7 13:18:39 db1 clurgmgrd[17888]: <err> #48: Unable to obtain > cluster lock: Connection timed out May 7 13:18:41 db1 kernel: dlm: > Magma: reply from 2 no lock May 7 13:18:41 db1 kernel: dlm: reply May > 7 13:18:41 db1 kernel: rh_cmd 5 May 7 13:18:41 db1 kernel: rh_lkid > 200242 May 7 13:18:41 db1 kernel: lockstate 2 May 7 13:18:41 db1 > kernel: nodeid 0 May 7 13:18:41 db1 kernel: status 0 May 7 13:18:41 > db1 kernel: lkid ee0388 May 7 13:18:41 db1 clurgmgrd[17888]: <notice> > Stopping service WATSON This usually is a dlm bug. Once the DLM gets in to this state, rgmanager blows up. What rgmanager are you using? (There's only one lock per service; the complexity of the service doesn't matter...) -- Lon Hohberger - Software Engineer - Red Hat, Inc. -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster