On Mon, 2006-12-11 at 10:22 -0800, aberoham@xxxxxxxxx wrote: > Another clue -- haldaemon crashed on this node, perhaps at the same > time clurgmgrd started to hang? > > lastest dmesg entry -- > hal[3509]: segfault at 0000000000000000 rip 0000000000400ec7 rsp > 0000007fbfffd7e0 error 4 > > grep clurgmgrd /var/log/messages -- > [snip] > Dec 11 06:39:43 bamf01 clurgmgrd: [7983]: <info> > Executing /etc/init.d/rsyncd-tiger status > Dec 11 06:39:44 bamf01 clurgmgrd: [7983]: <info> > Executing /etc/init.d/httpd.cluster status > Dec 11 06:39:44 bamf01 clurgmgrd: [7983]: <info> > Executing /etc/init.d/rsyncd-hartigan status > Dec 11 06:41:11 bamf01 clurgmgrd[7983]: <err> #48: Unable to obtain > cluster lock: Connection timed out > Dec 11 06:41:56 bamf01 clurgmgrd[7983]: <err> #50: Unable to obtain > cluster lock: Connection timed out > [snip] Could you check /proc/slabinfo and post it from all nodes? I think I know what this is. -- Lon
Attachment:
signature.asc
Description: This is a digitally signed message part
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster