Morning all. We've been experienceing regular cluster crashes on RHEL4u4.
This system has 5 nodes and a few dozen nodes mounting shares via nfs.
Periodically, nodes will panic, get fenced and all continues on. This system
does have some of the HP Product Support Pack installed (not the HP bnx2
driver). Below is the section from the logs. It is hand typed but I am fairly sure
it's accurrate.
The machines are HP DL360-G5's. The nics are Broadcom NeXtreme II 5708's.
Anyone else seeing this?
Corey
===========================================================
Unable to handle kernel NULL pointer dereference at virtual address 000000ac
printing eip:
f8f339ae
*pde = 37038001
Oops: 0000 [#1]
SMP
Modules linked in: ipt_multiport iptable_nat ip_conntrack ip_tables
ip_vs_rr ip_vs cpqci(U) ipmi_dev intf ipmi_si ipmi_msghandler xp(U)
mptctl mptbase sg autofs4 i2c_dev i2c_core lock_dlm(U) gfs(U)
lock_harness(U) dlm(U) cman(U) md5 ipv6 nfsd exportfs lockd nfs_acl
sunrpc joydev dm_mirror button battery ac ehci_hcd uhci_hcd bnx2 ext3
jbd dm_mod qla6312(U) qla2400(U) qla2300(U) qla2xxx_conf(U) qla2xxx(U)
cciss sd_mod scsi_mod
CPU: 0
EIP: 0060:[<f8f339ae>] Tainted: P VLI
EFLAGS: 00010202 (2.6.9-42.0.2.ELsmp)
EIP is at bnx2_tx_int+0x48/0x1d1 [bnx2]
eax: f70620dc ebx: 00000ad7 ecx: 00000002 edx: 00000037
esi: 00000a37 edi: 00000000 ebp: f6a0b200 esp: c03cefa0
ds: 007b es: 007b ss: 0068
Process dlm_recvd (pid: 3973, threadinfo=c03ce000 task=f71652f0)
Stack: f70620dc 00000037 f5c19000 00000000 f6a0b200 f6a0afc0 c03cefd4 f8f3431d
00000000 f6a0afc0 c201fd80 15a3182b c0280e24 000493dc 00000001 c0392c18
0000000a 00000000 c01269b8 f59d4dc4 00000046 c038b900 f59d4000 c010819f
Call trace:
[<f8f3431d>] bnx2_poll+0x4f/0x142 [bnx2]
[<c0280e24>] net_rx_action+0xae/0x160
[<c01269b8>] __do_softirq+0x4c/0xb1
[<c010819f>] do_softirq+0x4f/0x56
===========================================================
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster