Two of the three nodes in my CS/GFS cluster just crashed, which dissolved
quorum and allowed me to finally capture part of the kernel panic. Here is
what was displayed on the screen:
[<ffffffff885daea8>] :gfs:gfs_write+0x0/0x8
[<ffffffff885cb2a7>] :gfs:gfs_glock_d1+0x15c/0x16c
[<ffffffff885dc429>] :gfs:gfs_open+0x12c/0x15e
[<ffffffff8857a77d>] :nfsd:nfsd_vfs_write+0xf2/0x2e1
[<ffffffff885dc2fd>] :gfs:gfs_open+0x0/0x15e
[<ffffffff8001e115>] __dentry_open+0x101/0x1dc
[<ffffffff8857aff1>] :nfsd:nfsd_write+0xb5/0xd5
[<ffffffff88581c96>] :nfsd:nfsd3_proc_write+0xea/0x109
[<ffffffff885771c4>] :nfsd:nfsd_dispatch+0xd7/0x198
[<ffffffff883e1514>] :sunrpc:svc_process+0x44d/0x70b
[<ffffffff800625bf>] __down_read+0x12/0x92
[<ffffffff8857754d>] :nfsd:nfsd+0x0/0x2db
[<ffffffff885776fb>] :nfsd:nfsd+0x1ae/0x2db
[<ffffffff8005bfb1>] child_rip+0xa/0x11
[<ffffffff8857754d>] :nfsd:nfsd+0x0/0x2db
[<ffffffff8857754d>] :nfsd:nfsd+0x0/0x2db
[<ffffffff8005bfa7>] child_rip+0x0/0x11
Code: Bad RIP value.
RIP [<0000000000000000>] _stext+0x7fff000/0x1000
RSP <ffff81006ac9f6e8>
CR2: 0000000000000000
<0>Kernel panic - not syncing: Fatal exception
Is this enough to figure out what happened, and how can I prevent this from
happening in the future? I suspect that all the instability I have had with
my CS/GFS cluster is related to this sort of crash. I am using the
following on all three nodes:
cman-2.0.73-1.el5_1.1
openais-0.80.3-7.el5
rgmanager-2.0.31-1.el5.centos
lvm2-cluster-2.02.26-1.el5
luci-0.10.0-6.el5.centos.1
ricci-0.10.0-6.el5.centos.1
kernel-2.6.18-53.1.4.el5
gfs-utils-0.1.12-1.el5
kmod-gfs-0.1.19-7.el5_1.1
Thanks,
James
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster