The system boots and starts the kernel, then crashes. I wasn't watching the first time, so on a subsequent boot it gets to the point where it does a disk check because the system was not shut down cleanly. At different points in the disk check is where it crashes and reboots now. Thanks for any help you can provide. lspci 00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation CK804 SMBus (rev a2) 00:02.0 USB Controller: nVidia Corporation CK804 USB Controller (rev a2) 00:02.1 USB Controller: nVidia Corporation CK804 USB Controller (rev a3) 00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev f2) 00:07.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3) 00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3) 00:09.0 PCI bridge: nVidia Corporation CK804 PCI Bridge (rev a2) 00:0a.0 Ethernet controller: nVidia Corporation CK804 Ethernet Controller (rev a3) 00:0b.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 00:0c.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 00:0d.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 00:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 01:05.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) 04:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5721 Gigabit Ethernet PCI Express (rev 11) lsmod Module Size Used by ipt_state 1985 1 ip_conntrack 41077 1 ipt_state ipt_multiport 2113 3 ipt_LOG 6593 1 iptable_filter 3009 1 ip_tables 17601 4 ipt_state,ipt_multiport,ipt_LOG,iptable_filter parport_pc 24833 0 lp 12333 0 parport 37513 2 parport_pc,lp autofs4 25157 0 i2c_dev 11585 0 i2c_core 22337 1 i2c_dev sunrpc 163237 1 dm_mirror 30893 0 dm_mod 59989 1 dm_mirror button 6737 0 battery 9029 0 ac 4933 0 md5 4161 1 ipv6 235777 39 joydev 10497 0 ohci_hcd 21841 0 ehci_hcd 31301 0 forcedeth 24001 0 tg3 107077 0 ext3 117193 3 jbd 71385 1 ext3 sata_nv 9541 4 libata 66333 1 sata_nv sd_mod 17217 5 scsi_mod 122445 2 libata,sd_mod -----Original Message----- From: redhat-list-bounces@xxxxxxxxxx [mailto:redhat-list-bounces@xxxxxxxxxx] On Behalf Of George Magklaras Sent: Friday, May 11, 2007 1:27 AM To: General Red Hat Linux discussion list Subject: Re: Kernel 2.6.9-55 issues Troy, what is your disk subsystem on the x2200? At what point it won't boot? Does it reach the bootloader and at least start the kernel? Also if you could do an 'lspci' and an lsmod and show the output from your good kernel. ##The following is a guess## I don't have that kind of Sun kit, but there are all sorts of references to stability problems with AMD based chipsets. Also, FYI there is a kernel panic report for that kernel here: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=239484 This bug report concerns the Error Detection And Correction (EDAC) modules (hence the lsmod prompt). This comes from the edac kernel module thinking that there is something wrong with the bus or the memory. For your x2200, the system probably panics (any messages from the console during the boot failure?), as there is an option that defines a kernel panic on a kernel detecting EDAC parity errors. On your x1440 that are able to boot but they give the EDAC messages, do an lsmod and grep -i for edac. They seem to point out a 'noedac' boot option, but I am not sure. On the x1440 that spawn the edac messages, see if the /etc/modprobe.conf contains any references to the edac modules and you could try to remove them, see if that makes a difference. GM Troy Knabe wrote: > I upgraded from 2.6.9-42 to 2.6.9-55 kernel over the weekend. I have had issues with 3 servers. 1 server wouldn't boot (x2200 amd 148 proc). And two x4100's with 2 - Dual Core AMD Opteron(tm) Processor 285. The two x4100's are spewing these errors, but if I reboot them with the old 2.6.9-42 kernel then I don't get any of them. Anyone else experiencing issues with the new kernel? > > thanks > -Troy > > May 9 16:25:43 hostname kernel: EDAC k8 MC0: general bus error: > participating processor(local node response), time-out(no timeout) > memory transaction type(generic read), mem or i/o(mem access), cache > level(generic)May 9 16:25:43 hostname kernel: MC0: CE page 0xc, > offset 0x108, grain 8, syndrome 0x4b39, row 0, channel 1, label "": > k8_edacMay 9 16:25:43 hostname kernel: MC0: CE - no information > available: k8_edac Error Overflow setMay 9 16:25:43 hostname kernel: > EDAC k8 MC0: extended error code: ECC chipkill x4 errorMay 9 16:25:44 > hostname kernel: EDAC k8 MC0: general bus error: participating > processor(local node origin), time-out(no timeout) memory transaction > type(generic read), mem or i/o(mem access), cache level(generic)May 9 > 16:25:44 hostname kernel: MC0: CE page 0x1f1, offset 0x0, grain 8, > syndrome 0x28d8, row 3, channel 1, label "": k8_edacMay 9 16:25:44 > hostname kernel: MC0: CE - no information available: k8_edac Error > Overflow setMay 9 16:25:45 hostname kerne l: EDAC k8 MC0: extended error code: ECC chipkill x4 errorMay 9 16:25:46 hostname kernel: EDAC k8 MC0: general bus error: participating processor(local node origin), time-out(no timeout) memory transaction type(generic read), mem or i/o(mem access), cache level(generic)May 9 16:25:46 hostname kernel: MC0: CE page 0x1f1, offset 0x0, grain 8, syndrome 0x28d8, row 3, channel 1, label "": k8_edacMay 9 16:25:46 hostname kernel: MC0: CE - no information available: k8_edac Error Overflow setMay 9 16:25:46 hostname kernel: EDAC k8 MC0: extended error code: ECC chipkill x4 errorMay 9 16:25:47 hostname kernel: EDAC k8 MC0: general bus error: participating processor(local node origin), time-out(no timeout) memory transaction type(generic read), mem or i/o(mem access), cache level(generic)May 9 16:25:47 hostname kernel: MC0: CE page 0x138, offset 0xac0, grain 8, syndrome 0xeeff, row 0, channel 1, label "": k8_edacMay 9 16:25:47 hostname kernel: MC0: CE - no information available: k8_edac Error Overflow setMay 9 16:25:47 hostname kernel: EDAC k8 MC0: extended error code: ECC chipkill x4 error > -- -- George Magklaras Senior Computer Systems Engineer/UNIX Systems Administrator EMBnet Technical Management Board The Biotechnology Centre of Oslo, University of Oslo http://www.biotek.uio.no/ EMBnet Norway: http://www.no.embnet.org/ -- redhat-list mailing list unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list -- redhat-list mailing list unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list