Re: Re: rgmanger stuck, hung on futex

Lon Hohberger <lhh@xxxxxxxxxx> · Mon, 11 Dec 2006 14:49:56 -0500

On Mon, 2006-12-11 at 10:22 -0800, aberoham@xxxxxxxxx wrote:
> Another clue -- haldaemon crashed on this node, perhaps at the same
> time clurgmgrd started to hang? 
> 
> lastest dmesg entry --
> hal[3509]: segfault at 0000000000000000 rip 0000000000400ec7 rsp
> 0000007fbfffd7e0 error 4 
> 
> grep clurgmgrd /var/log/messages --
> [snip]
> Dec 11 06:39:43 bamf01 clurgmgrd: [7983]: <info>
> Executing /etc/init.d/rsyncd-tiger status
> Dec 11 06:39:44 bamf01 clurgmgrd: [7983]: <info>
> Executing /etc/init.d/httpd.cluster status 
> Dec 11 06:39:44 bamf01 clurgmgrd: [7983]: <info>
> Executing /etc/init.d/rsyncd-hartigan status
> Dec 11 06:41:11 bamf01 clurgmgrd[7983]: <err> #48: Unable to obtain
> cluster lock: Connection timed out
> Dec 11 06:41:56 bamf01 clurgmgrd[7983]: <err> #50: Unable to obtain
> cluster lock: Connection timed out 
> [snip]

Could you check /proc/slabinfo and post it from all nodes?  I think I
know what this is.

-- Lon

Attachment:
signature.asc

Description: This is a digitally signed message part
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster