Re: Kernel 2.6.9-55 issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Troy,

I assume you have a backup if this is a production system. Can you try and boot the system with the "nodmraid" option and see the outcome. It would help to tell me the disk config, as originally requested. There are issues with some nVidia SATA controllers. If these work essentially as "fake RAID" devices (as far as I know the lspci output below does not suggest a real hardware RAID controller), the dmraid module could create hickups and kernel panics. Disabling this with the nodmraid option in the kernel boot line (from your bootloader) could have varying results, depending on what type of RAID you are trying to emulate. That is the only thing I can suspect, if your hardware works perfectly well on the previous kernel. Any chance of capturing the boot log and your dmesg when your system boots properly (previous kernel)?

GM


Troy Knabe wrote:
The system boots and starts the kernel, then crashes. I wasn't watching the first time, so on a subsequent boot it gets to the point where it does a disk check because the system was not shut down cleanly. At different points in the disk check is where it crashes and reboots now. Thanks for any help you can provide.
lspci
00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3)
00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev a3)
00:01.1 SMBus: nVidia Corporation CK804 SMBus (rev a2)
00:02.0 USB Controller: nVidia Corporation CK804 USB Controller (rev a2)
00:02.1 USB Controller: nVidia Corporation CK804 USB Controller (rev a3)
00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev f2)
00:07.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3)
00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3)
00:09.0 PCI bridge: nVidia Corporation CK804 PCI Bridge (rev a2)
00:0a.0 Ethernet controller: nVidia Corporation CK804 Ethernet Controller (rev a3)
00:0b.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:0c.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:0d.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
01:05.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
04:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5721 Gigabit Ethernet PCI Express (rev 11)

lsmod
Module                  Size  Used by
ipt_state 1985 1 ip_conntrack 41077 1 ipt_state ipt_multiport 2113 3 ipt_LOG 6593 1 iptable_filter 3009 1 ip_tables 17601 4 ipt_state,ipt_multiport,ipt_LOG,iptable_filter parport_pc 24833 0 lp 12333 0 parport 37513 2 parport_pc,lp autofs4 25157 0 i2c_dev 11585 0 i2c_core 22337 1 i2c_dev sunrpc 163237 1 dm_mirror 30893 0 dm_mod 59989 1 dm_mirror button 6737 0 battery 9029 0 ac 4933 0 md5 4161 1 ipv6 235777 39 joydev 10497 0 ohci_hcd 21841 0 ehci_hcd 31301 0 forcedeth 24001 0 tg3 107077 0 ext3 117193 3 jbd 71385 1 ext3 sata_nv 9541 4 libata 66333 1 sata_nv sd_mod 17217 5 scsi_mod 122445 2 libata,sd_mod

-----Original Message-----
From: redhat-list-bounces@xxxxxxxxxx [mailto:redhat-list-bounces@xxxxxxxxxx] On Behalf Of George Magklaras
Sent: Friday, May 11, 2007 1:27 AM
To: General Red Hat Linux discussion list
Subject: Re: Kernel 2.6.9-55 issues

Troy, what is your disk subsystem on the x2200? At what point it won't boot? Does it reach the bootloader and at least start the kernel? Also if you could do an 'lspci' and an lsmod and show the output from your good kernel.


##The following is a guess##
I don't have that kind of Sun kit, but there are all sorts of references to stability problems with AMD based chipsets. Also, FYI there is a kernel panic report for that kernel here:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=239484

This bug report concerns the Error Detection And Correction (EDAC) modules (hence the lsmod prompt). This comes from the edac kernel module thinking that there is something wrong with the bus or the memory. For your x2200, the system probably panics (any messages from the console during the boot failure?), as there is an option that defines a kernel panic on a kernel detecting EDAC parity errors. On your x1440 that are able to boot but they give the EDAC messages, do an lsmod and grep -i for edac.  They seem to point out a 'noedac' boot option, but I am not sure.

On the x1440 that spawn the edac messages, see if the /etc/modprobe.conf
  contains any references to the edac modules and you could try to remove them, see if that makes a difference.

GM


Troy Knabe wrote:
I upgraded from 2.6.9-42 to 2.6.9-55 kernel over the weekend.  I have had issues with 3 servers.  1 server wouldn't boot (x2200 amd 148 proc).  And two x4100's with 2 - Dual Core AMD Opteron(tm) Processor 285.  The two x4100's are spewing these errors, but if I reboot them with the old 2.6.9-42 kernel then I don't get any of them.  Anyone else experiencing issues with the new kernel?
thanks
-Troy
May 9 16:25:43 hostname kernel: EDAC k8 MC0: general bus error: participating processor(local node response), time-out(no timeout) memory transaction type(generic read), mem or i/o(mem access), cache level(generic)May 9 16:25:43 hostname kernel: MC0: CE page 0xc, offset 0x108, grain 8, syndrome 0x4b39, row 0, channel 1, label "": k8_edacMay 9 16:25:43 hostname kernel: MC0: CE - no information available: k8_edac Error Overflow setMay 9 16:25:43 hostname kernel: EDAC k8 MC0: extended error code: ECC chipkill x4 errorMay 9 16:25:44 hostname kernel: EDAC k8 MC0: general bus error: participating processor(local node origin), time-out(no timeout) memory transaction type(generic read), mem or i/o(mem access), cache level(generic)May 9 16:25:44 hostname kernel: MC0: CE page 0x1f1, offset 0x0, grain 8, syndrome 0x28d8, row 3, channel 1, label "": k8_edacMay 9 16:25:44 hostname kernel: MC0: CE - no information available: k8_edac Error Overflow setMay 9 16:25:45 hostname kerne
l: EDAC k8 MC0: extended error code: ECC chipkill x4 errorMay  9 16:25:46 hostname kernel: EDAC k8 MC0: general bus error: participating processor(local node origin), time-out(no timeout) memory transaction type(generic read), mem or i/o(mem access), cache level(generic)May  9 16:25:46 hostname kernel: MC0: CE page 0x1f1, offset 0x0, grain 8, syndrome 0x28d8, row 3, channel 1, label "": k8_edacMay  9 16:25:46 hostname kernel: MC0: CE - no information available: k8_edac Error Overflow setMay  9 16:25:46 hostname kernel: EDAC k8 MC0: extended error code: ECC chipkill x4 errorMay  9 16:25:47 hostname kernel: EDAC k8 MC0: general bus error: participating processor(local node origin), time-out(no timeout) memory transaction type(generic read), mem or i/o(mem access), cache level(generic)May  9 16:25:47 hostname kernel: MC0: CE page 0x138, offset 0xac0, grain 8, syndrome 0xeeff, row 0, channel 1, label "": k8_edacMay  9 16:25:47 hostname kernel: MC0: CE - no information available
:
k8_edac Error Overflow setMay  9 16:25:47 hostname kernel: EDAC k8 MC0: extended error code: ECC chipkill x4 error

--
--
George Magklaras

Senior Computer Systems Engineer/UNIX Systems Administrator EMBnet Technical Management Board The Biotechnology Centre of Oslo, University of Oslo http://www.biotek.uio.no/

EMBnet Norway:	http://www.no.embnet.org/


--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list





--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list

[Index of Archives]     [CentOS]     [Kernel Development]     [PAM]     [Fedora Users]     [Red Hat Development]     [Big List of Linux Books]     [Linux Admin]     [Gimp]     [Asterisk PBX]     [Yosemite News]     [Red Hat Crash Utility]


  Powered by Linux