I can't reboot right now. I am not trying to emulate any kind of raid. It's internal disk is just a 250 gig SATA drive. I have rebooted on th previous kernel and it is working perfectly well. Attached is the boot log (both on clean kernel and the one with errors). The message file is too large to send to the red hat list. Thanks -Troy -----Original Message----- From: redhat-list-bounces@xxxxxxxxxx [mailto:redhat-list-bounces@xxxxxxxxxx] On Behalf Of George Magklaras Sent: Sunday, May 13, 2007 11:02 PM To: General Red Hat Linux discussion list Subject: Re: Kernel 2.6.9-55 issues Troy, I assume you have a backup if this is a production system. Can you try and boot the system with the "nodmraid" option and see the outcome. It would help to tell me the disk config, as originally requested. There are issues with some nVidia SATA controllers. If these work essentially as "fake RAID" devices (as far as I know the lspci output below does not suggest a real hardware RAID controller), the dmraid module could create hickups and kernel panics. Disabling this with the nodmraid option in the kernel boot line (from your bootloader) could have varying results, depending on what type of RAID you are trying to emulate. That is the only thing I can suspect, if your hardware works perfectly well on the previous kernel. Any chance of capturing the boot log and your dmesg when your system boots properly (previous kernel)? GM Troy Knabe wrote: > The system boots and starts the kernel, then crashes. I wasn't watching the first time, so on a subsequent boot it gets to the point where it does a disk check because the system was not shut down cleanly. At different points in the disk check is where it crashes and reboots now. Thanks for any help you can provide. > > lspci > 00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller > (rev a3) 00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev > a3) > 00:01.1 SMBus: nVidia Corporation CK804 SMBus (rev a2) 00:02.0 USB > Controller: nVidia Corporation CK804 USB Controller (rev a2) > 00:02.1 USB Controller: nVidia Corporation CK804 USB Controller (rev > a3) 00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev f2) > 00:07.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller > (rev f3) 00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA > Controller (rev f3) 00:09.0 PCI bridge: nVidia Corporation CK804 PCI > Bridge (rev a2) 00:0a.0 Ethernet controller: nVidia Corporation CK804 > Ethernet Controller (rev a3) 00:0b.0 PCI bridge: nVidia Corporation > CK804 PCIE Bridge (rev a3) 00:0c.0 PCI bridge: nVidia Corporation > CK804 PCIE Bridge (rev a3) 00:0d.0 PCI bridge: nVidia Corporation > CK804 PCIE Bridge (rev a3) 00:0e.0 PCI bridge: nVidia Corporation > CK804 PCIE Bridge (rev a3) 00:18.0 Host bridge: Advanced Micro Devices > [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration > 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 > [Athlon64/Opteron] Address Map > 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 > [Athlon64/Opteron] DRAM Controller > 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 > [Athlon64/Opteron] Miscellaneous Control 01:05.0 VGA compatible > controller: ATI Technologies Inc Rage XL (rev 27) 04:00.0 Ethernet > controller: Broadcom Corporation NetXtreme BCM5721 Gigabit Ethernet > PCI Express (rev 11) > > lsmod > Module Size Used by > ipt_state 1985 1 > ip_conntrack 41077 1 ipt_state > ipt_multiport 2113 3 > ipt_LOG 6593 1 > iptable_filter 3009 1 > ip_tables 17601 4 ipt_state,ipt_multiport,ipt_LOG,iptable_filter > parport_pc 24833 0 > lp 12333 0 > parport 37513 2 parport_pc,lp > autofs4 25157 0 > i2c_dev 11585 0 > i2c_core 22337 1 i2c_dev > sunrpc 163237 1 > dm_mirror 30893 0 > dm_mod 59989 1 dm_mirror > button 6737 0 > battery 9029 0 > ac 4933 0 > md5 4161 1 > ipv6 235777 39 > joydev 10497 0 > ohci_hcd 21841 0 > ehci_hcd 31301 0 > forcedeth 24001 0 > tg3 107077 0 > ext3 117193 3 > jbd 71385 1 ext3 > sata_nv 9541 4 > libata 66333 1 sata_nv > sd_mod 17217 5 > scsi_mod 122445 2 libata,sd_mod > > > > -----Original Message----- > From: redhat-list-bounces@xxxxxxxxxx > [mailto:redhat-list-bounces@xxxxxxxxxx] On Behalf Of George Magklaras > Sent: Friday, May 11, 2007 1:27 AM > To: General Red Hat Linux discussion list > Subject: Re: Kernel 2.6.9-55 issues > > Troy, what is your disk subsystem on the x2200? At what point it won't boot? Does it reach the bootloader and at least start the kernel? Also if you could do an 'lspci' and an lsmod and show the output from your good kernel. > > > ##The following is a guess## > I don't have that kind of Sun kit, but there are all sorts of references to stability problems with AMD based chipsets. Also, FYI there is a kernel panic report for that kernel here: > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=239484 > > This bug report concerns the Error Detection And Correction (EDAC) modules (hence the lsmod prompt). This comes from the edac kernel module thinking that there is something wrong with the bus or the memory. For your x2200, the system probably panics (any messages from the console during the boot failure?), as there is an option that defines a kernel panic on a kernel detecting EDAC parity errors. On your x1440 that are able to boot but they give the EDAC messages, do an lsmod and grep -i for edac. They seem to point out a 'noedac' boot option, but I am not sure. > > On the x1440 that spawn the edac messages, see if the /etc/modprobe.conf > contains any references to the edac modules and you could try to remove them, see if that makes a difference. > > GM > > > Troy Knabe wrote: >> I upgraded from 2.6.9-42 to 2.6.9-55 kernel over the weekend. I have had issues with 3 servers. 1 server wouldn't boot (x2200 amd 148 proc). And two x4100's with 2 - Dual Core AMD Opteron(tm) Processor 285. The two x4100's are spewing these errors, but if I reboot them with the old 2.6.9-42 kernel then I don't get any of them. Anyone else experiencing issues with the new kernel? >> >> thanks >> -Troy >> >> May 9 16:25:43 hostname kernel: EDAC k8 MC0: general bus error: >> participating processor(local node response), time-out(no timeout) >> memory transaction type(generic read), mem or i/o(mem access), cache >> level(generic)May 9 16:25:43 hostname kernel: MC0: CE page 0xc, >> offset 0x108, grain 8, syndrome 0x4b39, row 0, channel 1, label "": >> k8_edacMay 9 16:25:43 hostname kernel: MC0: CE - no information >> available: k8_edac Error Overflow setMay 9 16:25:43 hostname kernel: >> EDAC k8 MC0: extended error code: ECC chipkill x4 errorMay 9 >> 16:25:44 hostname kernel: EDAC k8 MC0: general bus error: >> participating processor(local node origin), time-out(no timeout) >> memory transaction type(generic read), mem or i/o(mem access), cache >> level(generic)May 9 >> 16:25:44 hostname kernel: MC0: CE page 0x1f1, offset 0x0, grain 8, >> syndrome 0x28d8, row 3, channel 1, label "": k8_edacMay 9 16:25:44 >> hostname kernel: MC0: CE - no information available: k8_edac Error >> Overflow setMay 9 16:25:45 hostname kerne > l: EDAC k8 MC0: extended error code: ECC chipkill x4 errorMay 9 > 16:25:46 hostname kernel: EDAC k8 MC0: general bus error: > participating processor(local node origin), time-out(no timeout) > memory transaction type(generic read), mem or i/o(mem access), cache > level(generic)May 9 16:25:46 hostname kernel: MC0: CE page 0x1f1, > offset 0x0, grain 8, syndrome 0x28d8, row 3, channel 1, label "": > k8_edacMay 9 16:25:46 hostname kernel: MC0: CE - no information > available: k8_edac Error Overflow setMay 9 16:25:46 hostname kernel: > EDAC k8 MC0: extended error code: ECC chipkill x4 errorMay 9 16:25:47 > hostname kernel: EDAC k8 MC0: general bus error: participating > processor(local node origin), time-out(no timeout) memory transaction > type(generic read), mem or i/o(mem access), cache level(generic)May 9 > 16:25:47 hostname kernel: MC0: CE page 0x138, offset 0xac0, grain 8, > syndrome 0xeeff, row 0, channel 1, label "": k8_edacMay 9 16:25:47 > hostname kernel: MC0: CE - no information available : > k8_edac Error Overflow setMay 9 16:25:47 hostname kernel: EDAC k8 > MC0: extended error code: ECC chipkill x4 error >> > > -- > -- > George Magklaras > > Senior Computer Systems Engineer/UNIX Systems Administrator EMBnet > Technical Management Board The Biotechnology Centre of Oslo, > University of Oslo http://www.biotek.uio.no/ > > EMBnet Norway: http://www.no.embnet.org/ > > > -- > redhat-list mailing list > unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe > https://www.redhat.com/mailman/listinfo/redhat-list > -- redhat-list mailing list unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list
-- redhat-list mailing list unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list