I have a vmcore (from kdump), if the developers are interested, let me know a place to upload the vmcore file.
I used the crash command to do a backtrace.
I manage to get machines with later 5.4 and 5.5 to panic the same way. Broadcom or Intel NICs panic the same way.
This is an NFS client where the NFS server is restarting several times; NFSv3, mount it with defaults,noatime.
The client was busy writing things on NFS-mounted space while the NFS servers was restarting several times.
So far, if I mount it with udp option, I've not managed to panic the machines.
The bad news is that NFSv4 is strictly TCP, if I were to go down that route.
From the backtrace, it seems the crash is TCP-related. I'll be trying couple Linux TCP settings changes.
It's a possibility that the issues are with TCP in general (not NFS).
I would like to enlist community's help in further understanding this and potential work-arounds with this TCP issues.
crash> sys
KERNEL: vmlinux
DUMPFILE: vmcore
CPUS: 4
DATE: Tue Apr 20 15:04:09 2010
UPTIME: 18:55:25
LOAD AVERAGE: 0.13, 0.09, 0.03
TASKS: 340
RELEASE: 2.6.18-164.el5
VERSION: #1 SMP Thu Sep 3 03:28:30 EDT 2009
MACHINE: x86_64 (2660 Mhz)
MEMORY: 23.6 GB
PANIC: "Oops: 0000 [1] SMP " (check log for details)
crash> bt -a
PID: 0 TASK: ffffffff802ffae0 CPU: 0 COMMAND: "swapper"
#0 [ffffffff8043ef20] crash_nmi_callback at ffffffff8007a3bf
#1 [ffffffff8043ef40] do_nmi at ffffffff8006585a
#2 [ffffffff8043ef50] nmi at ffffffff80064ebf
[exception RIP: acpi_processor_idle+579]
RIP: ffffffff8019765e RSP: ffffffff803f1f48 RFLAGS: 00000093
RAX: 000000000073111a RBX: 000000000073111a RCX: 0000000000000808
RDX: 0000000000000815 RSI: 0000000000000003 RDI: 0000000000000000
RBP: ffff81063e480100 R8: ffffffff803f0000 R9: ffffffff804b5e2c
R10: 0000000000000046 R11: 0000000000000046 R12: 0000000000000000
R13: ffff81063e480000 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <exception stack> ---
#3 [ffffffff803f1f48] acpi_processor_idle at ffffffff8019765e
#4 [ffffffff803f1f90] cpu_idle at ffffffff8004939e
PID: 0 TASK: ffff810115f11100 CPU: 1 COMMAND: "swapper"
#0 [ffff810115f38f20] crash_nmi_callback at ffffffff8007a3bf
#1 [ffff810115f38f40] do_nmi at ffffffff8006585a
#2 [ffff810115f38f50] nmi at ffffffff80064ebf
[exception RIP: acpi_processor_idle+579]
RIP: ffffffff8019765e RSP: ffff810115f2fea8 RFLAGS: 00000093
RAX: 0000000000731145 RBX: 0000000000731145 RCX: 0000000000000808
RDX: 0000000000000815 RSI: 0000000000000003 RDI: 0000000000000000
RBP: ffff81063f173900 R8: ffff810115f2e000 R9: ffffffff804b5e2c
R10: 0000000000000046 R11: 0000000000000046 R12: 00000000000000ff
R13: ffff81063f173800 R14: 0000000000000100 R15: ffffffff803ea280
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <exception stack> ---
#3 [ffff810115f2fea8] acpi_processor_idle at ffffffff8019765e
#4 [ffff810115f2fef0] cpu_idle at ffffffff8004939e
PID: 0 TASK: ffff810115f20080 CPU: 2 COMMAND: "swapper"
#0 [ffff810115f6bbc0] crash_kexec at ffffffff800ac5b9
#1 [ffff810115f6bc80] __die at ffffffff80065127
#2 [ffff810115f6bcc0] do_page_fault at ffffffff80066da7
#3 [ffff810115f6bdb0] error_exit at ffffffff8005dde9
[exception RIP: pskb_copy+307]
RIP: ffffffff8022486b RSP: ffff810115f6be60 RFLAGS: 00010282
RAX: ffff81062cd5f540 RBX: ffff81062cac3980 RCX: ffff81046fb1e550
RDX: 0000000000000000 RSI: ffff81062cd5f550 RDI: 0000000000000004
RBP: ffff810466f54a80 R8: 00000000081f02b4 R9: 0000000000000000
R10: ffff81062cac3980 R11: 00000000000000c8 R12: 0000000000000220
R13: ffff810466f54a80 R14: 0000000000000002 R15: ffffffff803ea2a0
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#4 [ffff810115f6be78] tcp_transmit_skb at ffffffff800217b7
#5 [ffff810115f6bec8] tcp_retransmit_skb at ffffffff80250ccd
#6 [ffff810115f6bf08] tcp_write_timer at ffffffff80252652
#7 [ffff810115f6bf28] run_timer_softirq at ffffffff800968be
#8 [ffff810115f6bf58] __do_softirq at ffffffff8001235a
#9 [ffff810115f6bf88] call_softirq at ffffffff8005e2fc
#10 [ffff810115f6bfa0] do_softirq at ffffffff8006cb14
#11 [ffff810115f6bfb0] apic_timer_interrupt at ffffffff8005dc8e
--- <IRQ stack> ---
#12 [ffff810115f67df8] apic_timer_interrupt at ffffffff8005dc8e
[exception RIP: acpi_processor_idle+628]
RIP: ffffffff8019768f RSP: ffff810115f67ea8 RFLAGS: 00000282
RAX: ffff810115f67fd8 RBX: ffff81063f173100 RCX: 0000000080184973
RDX: ffff81063f173000 RSI: 0000000000000082 RDI: ffffffff804b5e2c
RBP: ffff810115f67ee8 R8: ffff810115f66000 R9: ffff810115f67ecc
R10: 0000000000000046 R11: ffff810115f67ee8 R12: ffff81063f6e1180
R13: 0000000010008040 R14: ffff81063f6e1180 R15: ffff81063f6e1180
ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018
#13 [ffff810115f67ea0] acpi_processor_idle at ffffffff80197685
#14 [ffff810115f67ef0] cpu_idle at ffffffff8004939e
PID: 0 TASK: ffff810115f94100 CPU: 3 COMMAND: "swapper"
#0 [ffff810115fbbf20] crash_nmi_callback at ffffffff8007a3bf
#1 [ffff810115fbbf40] do_nmi at ffffffff8006585a
#2 [ffff810115fbbf50] nmi at ffffffff80064ebf
[exception RIP: acpi_processor_idle+579]
RIP: ffffffff8019765e RSP: ffff810115fb9ea8 RFLAGS: 00000097
RAX: 0000000000731169 RBX: 0000000000731169 RCX: 0000000000000808
RDX: 0000000000000815 RSI: 0000000000000003 RDI: 0000000000000000
RBP: ffff81063f174900 R8: ffff810115fb8000 R9: ffff810115f942f0
R10: 0000000000000046 R11: 0000000000000046 R12: 00000000000000ff
R13: ffff81063f174800 R14: 0000000000000300 R15: ffffffff803ea2c0
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <exception stack> ---
#3 [ffff810115fb9ea8] acpi_processor_idle at ffffffff8019765e
#4 [ffff810115fb9ef0] cpu_idle at ffffffff8004939e
crash> quit
_______________________________________________ CentOS mailing list CentOS@xxxxxxxxxx http://lists.centos.org/mailman/listinfo/centos