RE: clurgmgrd - <err> #48: Unable to obtain cluster lock: Connectiontimed out

"Robert Hurst" <rhurst@xxxxxxxxxxxxxxxxx> · Mon, 14 May 2007 15:46:50 -0400

Title: Re:  clurgmgrd -  #48: Unable to obtain cluster lock: Connectiontimed out

Any new thoughts on this, is it a new bug, is it fixed with U5?  I have a ticket open, but your insights on how probable this is a recurring bug would be helpful.  Thanks.

On Fri, 2007-05-11 at 19:54 -0400, rhurst@xxxxxxxxxxxxxxxxx wrote:

    We are using RHEL 4 U4 with the GFS/CS that works for that release:

    $ rpm -q rgmanager dlm dlm-kernel magma magma-plugins

    rgmanager-1.9.54-1

    dlm-1.0.1-1

    dlm-kernel-2.6.9-44.9

    magma-1.0.6-0

    magma-plugins-1.0.9-0

    Would the just-announced GFS/CS for U5 help any?  Looks like a lof issues were addressed.

Robert Hurst, Sr. Caché Administrator
Beth Israel Deaconess Medical Center
1135 Tremont Street, REN-7
Boston, Massachusetts   02120-2140
617-754-8754 · Fax: 617-754-8730 · Cell: 401-787-3154
Any technology distinguishable from magic is insufficiently advanced.

    From: linux-cluster-bounces@xxxxxxxxxx on behalf of Lon Hohberger

    Sent: Fri 5/11/2007 4:19 PM

    To: linux clustering

    Subject: Re:  clurgmgrd - <err> #48: Unable to obtain cluster lock: Connectiontimed out

    On Mon, May 07, 2007 at 01:54:56PM -0400, rhurst@xxxxxxxxxxxxxxxxx wrote:

    > What could cause clurgmgrd fail like this?  If clurgmgrd has a hiccup

    > like this, is it supposed to shutdown its services?  Is there something

    > in our implementation that could have prevented this from shutting down?

    >

    > For unexplained reasons, we just had our CS service (WATSON) go down on

    > its own, and the syslog entry details the event as:

    >

    > May  7 13:18:39 db1 clurgmgrd[17888]: <err> #48: Unable to obtain

    > cluster lock: Connection timed out

    > May  7 13:18:41 db1 kernel: dlm: Magma: reply from 2 no lock

    > May  7 13:18:41 db1 kernel: dlm: reply

    > May  7 13:18:41 db1 kernel: rh_cmd 5

    > May  7 13:18:41 db1 kernel: rh_lkid 200242

    > May  7 13:18:41 db1 kernel: lockstate 2

    > May  7 13:18:41 db1 kernel: nodeid 0

    > May  7 13:18:41 db1 kernel: status 0

    > May  7 13:18:41 db1 kernel: lkid ee0388

    > May  7 13:18:41 db1 clurgmgrd[17888]: <notice> Stopping service WATSON

    This usually is a dlm bug.  Once the DLM gets in to this state,

    rgmanager blows up.  What rgmanager are you using?

    (There's only one lock per service; the complexity of the service

    doesn't matter...)

    --

    Lon Hohberger - Software Engineer - Red Hat, Inc.

    --

    Linux-cluster mailing list

    Linux-cluster@xxxxxxxxxx

    https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster