Re: Unkillable clurgmgrd

Lon Hohberger <lhh@xxxxxxxxxx> · Mon, 12 Nov 2007 15:36:07 -0500

On Sun, 2007-11-11 at 23:57 +0100, Jos Vos wrote:
> Hi,
> 
> I have a node that has an unkillable (kill -9 doesn't work) clurgmgrd
> running.  I have fenced it now for the third time, with the same
> result after startup...
> 
> Stracing clutstat gives:
> 
> [...]
> socket(PF_FILE, SOCK_STREAM, 0)         = 5
> connect(5, {sa_family=AF_FILE, path="/var/run/cluster/rgmanager.sk"}, 110) = -1 ENOENT (No such file or directory)
> close(5)                                = 0
> dup(2)                                  = 5
> fcntl(5, F_GETFL)                       = 0x8002 (flags O_RDWR|O_LARGEFILE)
> fstat(5, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aaaaaaac000
> lseek(5, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
> write(5, "msg_open: No such file or direct"..., 36msg_open: No such file or directory
> ) = 36
> close(5)                                = 0
> munmap(0x2aaaaaaac000, 4096)            = 0
> [...]
> 
> How to get this node back up again???
> 
> This is on a RHEL 5.0 clone.

If it's unkillable, it's stuck waiting on the kernel for something.  

echo 1 > /proc/sys/kernel/sysrq
echo t > /proc/sysrq-trigger

dmesg > foo.out

reply + attach foo.out ;)

-- Lon

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster