We have here an knfs fileserver running ext3 on 2.4.18 kernel with three filesystems: <CLIP> Filesystem 1k-blocks Used Available Use% Mounted on /dev/hda3 10080520 3609632 6368476 37% / /dev/hda4 16437332 12295408 3974932 76% /home /dev/md1 1024872060 409906136 614441636 41% /fs </CLIP> The /fs filesystem lives on a software RAID5 md1 partiton consisting of 7 SCSI disks, two SCSI wires, on a qlogic scsi controller (qla1280 driver). Quotas are turned on, but I have been getting crashes with quotas off too. Almost all load is on the /fs filesystem. The kernel is tainted by Intels e1000 gigabit ethernet driver. However, these problems have occurred also when the Intels driver is not in use. I have tried to rule out motherboard/cpu/memory problems by doing kernel compilatations in a loop for 24 hours (trying to get sig11). No problems there. The machine has Intels dual CPU motherboard. Since we have been having these stability problems, we allready are using single CPU kernel, but it seems that it did not help. About once in two weeks, the machine crashes, always giving a filesystem related stack trace (it has a serial console, so I get stack trace even when it cannot be saved to disk anymore). Here is the latest oops and a panic occurring right after that (it seems, that the system was badly messed up after the oops, so the last two traces might not be interesting). Does anyone have any ideas about what the problem might be? I have other traces while the system was still running SMP too. The oops: Oops: 0000 CPU: 0 EIP: 0010:[<c0165e69>] Tainted: P Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010246 eax: 00000000 ebx: dbffe1a0 ecx: eb270ba0 edx: dbffe1a0 esi: 00000000 edi: f1f4ed90 ebp: f1f4edc0 esp: f630be70 ds: 0018 es: 0018 ss: 0018 Process kjournald (pid: 1274, stackpage=f630b000) Stack: dbffe1a0 f1f4e970 c01614c3 dbffe1a0 f1f4e970 00000000 00000000 00000000 00000023 eb270ba0 e3254940 005cb334 0100010a c01e93d3 f783dda0 00000001 c351dda0 c351d260 cbc28c00 cbc28540 d882c260 c05e8ce0 da0c0180 da0c06c0 Call Trace: [<c01614c3>] [<c01e93d3>] [<c0164375>] [<c01641e0>] [<c0105726>] [<c0164200>] Code: 8b 56 04 85 d2 79 23 68 c7 fd 22 c0 68 bc 06 00 00 68 19 fb >>EIP; c0165e69 <__journal_remove_journal_head+9/e0> <===== Trace; c01614c3 <journal_commit_transaction+343/119a> Trace; c01e93d3 <ip_rcv+313/3a0> Trace; c0164375 <kjournald+175/2b0> Trace; c01641e0 <commit_timeout+0/10> Trace; c0105726 <kernel_thread+26/30> Trace; c0164200 <kjournald+0/2b0> Code; c0165e69 <__journal_remove_journal_head+9/e0> 00000000 <_EIP>: Code; c0165e69 <__journal_remove_journal_head+9/e0> <===== 0: 8b 56 04 mov 0x4(%esi),%edx <===== Code; c0165e6c <__journal_remove_journal_head+c/e0> 3: 85 d2 test %edx,%edx Code; c0165e6e <__journal_remove_journal_head+e/e0> 5: 79 23 jns 2a <_EIP+0x2a> c0165e93 <__journal_remove_journal_head+33/e0> Code; c0165e70 <__journal_remove_journal_head+10/e0> 7: 68 c7 fd 22 c0 push $0xc022fdc7 Code; c0165e75 <__journal_remove_journal_head+15/e0> c: 68 bc 06 00 00 push $0x6bc Code; c0165e7a <__journal_remove_journal_head+1a/e0> 11: 68 19 fb 00 00 push $0xfb19 Here is the second stack trace: invalid operand: 0000 CPU: 0 EIP: 0010:[<c011ab96>] Tainted: P Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010246 eax: 00000000 ebx: 00000000 ecx: 00000000 edx: 00000000 esi: f630a000 edi: 0000000b ebp: 00000004 esp: f630bd44 ds: 0018 es: 0018 ss: 0018 Process kjournald (pid: 0, stackpage=f630b000) Stack: c022c170 f630be3c c0114b9e c023ff2a 00000000 c01074b6 00000000 c02406e6 c0225f4f c022c170 00000000 00000001 00000000 c0165e69 c0165e69 c0114f87 0000000b f630be3c 00000000 00000282 f630a000 00000000 f630a000 00000000 Call Trace: [<c0114b9e>] [<c01074b6>] [<c0165e69>] [<c0165e69>] [<c0114f87>] [<f899b662>] [<f8906195>] [<f8960a80>] [<c01a456a>] [<c010b30d>] [<c0135f05>] [<c0114be0>] [<c0107024>] [<c0160018>] [<c0165e69>] [<c01614c3>] [<c01e93d3>] [<c0164375>] [<c01641e0>] [<c0105726>] [<c0164200>] Code: 0f 0b e9 a9 fe ff ff 8d 76 00 53 8b 44 24 08 8b 5c 24 0c 85 >>EIP; c011ab96 <do_exit+1b6/1c0> <===== Trace; c0114b9e <bust_spinlocks+3e/50> Trace; c01074b6 <die+46/60> Trace; c0165e69 <__journal_remove_journal_head+9/e0> Trace; c0165e69 <__journal_remove_journal_head+9/e0> Trace; c0114f87 <do_page_fault+3a7/4eb> Trace; f899b662 <[nfsd]nfsd_proc_rename+52/110> Trace; f8906195 <[md]md_make_request+35/70> Trace; f8960a80 <[sd_mod]sd_template+0/0> Trace; c01a456a <generic_make_request+13a/150> Trace; c010b30d <call_apic_timer_interrupt+5/18> Trace; c0135f05 <__refile_buffer+55/60> Trace; c0114be0 <do_page_fault+0/4eb> Trace; c0107024 <error_code+34/40> Trace; c0160018 <journal_get_undo_access+68/110> Trace; c0165e69 <__journal_remove_journal_head+9/e0> Trace; c01614c3 <journal_commit_transaction+343/119a> Trace; c01e93d3 <ip_rcv+313/3a0> Trace; c0164375 <kjournald+175/2b0> Trace; c01641e0 <commit_timeout+0/10> Trace; c0105726 <kernel_thread+26/30> Trace; c0164200 <kjournald+0/2b0> Code; c011ab96 <do_exit+1b6/1c0> 00000000 <_EIP>: Code; c011ab96 <do_exit+1b6/1c0> <===== 0: 0f 0b ud2a <===== Code; c011ab98 <do_exit+1b8/1c0> 2: e9 a9 fe ff ff jmp fffffeb0 <_EIP+0xfffffeb0> c011aa46 <do _exit+66/1c0> Code; c011ab9d <do_exit+1bd/1c0> 7: 8d 76 00 lea 0x0(%esi),%esi Code; c011aba0 <complete_and_exit+0/20> a: 53 push %ebx Code; c011aba1 <complete_and_exit+1/20> b: 8b 44 24 08 mov 0x8(%esp,1),%eax Code; c011aba5 <complete_and_exit+5/20> f: 8b 5c 24 0c mov 0xc(%esp,1),%ebx Code; c011aba9 <complete_and_exit+9/20> 13: 85 00 test %eax,(%eax) And here is the final crash: <1>Unable to handle kernel paging request at virtual address fffffa50 c012e2d2 *pde = 00001063 Oops: 0000 CPU: 0 EIP: 0010:[<c012e2d2>] Tainted: P Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010202 eax: fffffa38 ebx: 00000002 ecx: fffffa38 edx: 00000000 esi: f26ff3c0 edi: f26ff3c0 ebp: f77b53e0 esp: f630bad8 ds: 0018 es: 0018 ss: 0018 Process kjournald (pid: 0, stackpage=f630b000) Stack: c01d865e f26ff3c0 00000000 c01d869b f26ff3c0 00000000 c01d8809 f26ff3c0 f26ff3c0 c020723c f26ff3c0 2030d680 00000006 00000000 f630bb28 f77b53e0 f77b53e0 00010000 f630b038 f630b038 cc0ad680 2030d680 f26ff3c0 ed9da000 Call Trace: [<c01d865e>] [<c01d869b>] [<c01d8809>] [<c020723c>] [<c011ee90>] [<c01dc7da>] [<c010831a>] [<c011bca3>] [<c01084cc>] [<c010a4e8>] [<c0110018>] [<c0117c20>] [<c011aa21>] [<c01074c2>] [<c0107700>] [<c0107780>] [<c011ab96>] [<c01084bc>] [<c010a4e8>] [<c0107024>] [<c011ab96>] [<c0114b9e>] [<c01074b6>] [<c0165e69>] [<c0165e69>] [<c0114f87>] [<f899b662>] [<f8906195>] [<f8960a80>] [<c01a456a>] [<c010b30d>] [<c0135f05>] [<c0114be0>] [<c0107024>] [<c0160018>] [<c0165e69>] [<c01614c3>] [<c01e93d3>] [<c0164375>] [<c01641e0>] [<c0105726>] [<c0164200>] Code: 8b 41 18 a9 00 40 00 00 75 14 ff 49 14 0f 94 c0 84 c0 74 0a >>EIP; c012e2d2 <__free_pages+2/30> <===== Trace; c01d865e <skb_release_data+3e/70> Trace; c01d869b <kfree_skbmem+b/70> Trace; c01d8809 <__kfree_skb+109/110> Trace; c020723c <arp_rcv+44c/460> Trace; c011ee90 <update_process_times+20/b0> Trace; c01dc7da <net_rx_action+12a/210> Trace; c010831a <handle_IRQ_event+3a/70> Trace; c011bca3 <do_softirq+53/a0> Trace; c01084cc <do_IRQ+9c/b0> Trace; c010a4e8 <call_do_IRQ+5/d> Trace; c0110018 <centaur_get_mcr+18/90> Trace; c0117c20 <panic+e0/f0> Trace; c011aa21 <do_exit+41/1c0> Trace; c01074c2 <die+52/60> Trace; c0107700 <do_invalid_op+0/90> Trace; c0107780 <do_invalid_op+80/90> Trace; c011ab96 <do_exit+1b6/1c0> Trace; c01084bc <do_IRQ+8c/b0> Trace; c010a4e8 <call_do_IRQ+5/d> Trace; c0107024 <error_code+34/40> Trace; c011ab96 <do_exit+1b6/1c0> Trace; c0114b9e <bust_spinlocks+3e/50> Trace; c01074b6 <die+46/60> Trace; c0165e69 <__journal_remove_journal_head+9/e0> Trace; c0165e69 <__journal_remove_journal_head+9/e0> Trace; c0114f87 <do_page_fault+3a7/4eb> Trace; f899b662 <[nfsd]nfsd_proc_rename+52/110> Trace; f8906195 <[md]md_make_request+35/70> Trace; f8960a80 <[sd_mod]sd_template+0/0> Trace; c01a456a <generic_make_request+13a/150> Trace; c010b30d <call_apic_timer_interrupt+5/18> Trace; c0135f05 <__refile_buffer+55/60> Trace; c0114be0 <do_page_fault+0/4eb> Trace; c0107024 <error_code+34/40> Trace; c0160018 <journal_get_undo_access+68/110> Trace; c0165e69 <__journal_remove_journal_head+9/e0> Trace; c01614c3 <journal_commit_transaction+343/119a> Trace; c01e93d3 <ip_rcv+313/3a0> Trace; c0164375 <kjournald+175/2b0> Trace; c01641e0 <commit_timeout+0/10> Trace; c0105726 <kernel_thread+26/30> Trace; c0164200 <kjournald+0/2b0> Code; c012e2d2 <__free_pages+2/30> 00000000 <_EIP>: Code; c012e2d2 <__free_pages+2/30> <===== 0: 8b 41 18 mov 0x18(%ecx),%eax <===== Code; c012e2d5 <__free_pages+5/30> 3: a9 00 40 00 00 test $0x4000,%eax Code; c012e2da <__free_pages+a/30> 8: 75 14 jne 1e <_EIP+0x1e> c012e2f0 <__free_pages+20/30> Code; c012e2dc <__free_pages+c/30> a: ff 49 14 decl 0x14(%ecx) Code; c012e2df <__free_pages+f/30> d: 0f 94 c0 sete %al Code; c012e2e2 <__free_pages+12/30> 10: 84 c0 test %al,%al Code; c012e2e4 <__free_pages+14/30> 12: 74 0a je 1e <_EIP+0x1e> c012e2f0 <__free_pages+20/30> - Jani