We have a rack of almost identical Supermicro P4SCE machines. (differing disks, ram and cpu) On these we have recently seen the following problems. 1. Screen goes blank during boot menu. Then there is massive on-screen image corruption until the fb is reset by init somewhere. In the beginning you can make out the text, but there are outlined squares all over the monitor. When more text appears it seems to make a mess all over the screen. What kernel: All FC2 kernels, ever since atleast test1 2. One of the computers will not boot on a kernel newer than kernel-smp-2.6.5-1.358. The following kernels fail with what seems like ide problems (VERY hard to make out due to bug #1) kernel-smp-2.6.6-1.435, kernel-smp-2.6.6-1.435.2.3, kernel-smp-2.6.7-1.494.2.2, kernel-smp-2.6.6-1.427 It seems like one of the disks (2 sata in raid 1) is reported as failed, an it is rescheduling something. Then it stops. 3. All of our FC2 servers recently (3-4 days ago) stopped serving http pages to mac computers. I have no idea why, but customers from all over is complaining that their mac machines cannot open the webpages. Windows machines on the same net works. The mac machines has tested IE and newest Safari. It seems like a FC2 yum update triggered this event, but we have no idea how and have no way of doing testing as we don't have any macs. 4. One of our servers is having BIG problems with uneven intervals recently. The last week it has had problems about 5-6 times I think. Very (VERY) suddenly the load increases to 700+ and https gets OOM killed rapidly. Network connectivity is immedeately offline and console responce is _BAD_. I once waited 5min for a password prompt before I had to reboot it due to customer complaints. I saw the 700-900 load climb once when I had let top stay on console. This has occured only since after Monday. Hardware info below is from the server that won't boot new kernels, for more info please ask and I will provide. Sincerly Hans Kristian Rosbach # lspci 00:00.0 Host bridge: Intel Corp. 82875P Memory Controller Hub (rev 02) 00:1e.0 PCI bridge: Intel Corp. 82801BA/CA/DB/EB/ER Hub interface to PCI Bridge (rev c2) 00:1f.0 ISA bridge: Intel Corp. 82801EB/ER (ICH5/ICH5R) LPC Bridge (rev 02) 00:1f.2 IDE interface: Intel Corp. 82801EB (ICH5) Serial ATA 150 Storage Controller (rev 02) 00:1f.3 SMBus: Intel Corp. 82801EB/ER (ICH5/ICH5R) SMBus Controller (rev 02) 01:09.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) 01:0a.0 Ethernet controller: Intel Corp. 82541EI Gigabit Ethernet Controller (Copper) 01:0b.0 Ethernet controller: Intel Corp. 82541EI Gigabit Ethernet Controller (Copper) ## Hyperthreading enabled: # cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 3 model name : Intel(R) Pentium(R) 4 CPU 2.80GHz stepping : 3 cpu MHz : 2796.078 cache size : 1024 KB physical id : 0 siblings : 2 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pni monitor ds_cpl cid bogomips : 5537.79 processor : 1 vendor_id : GenuineIntel cpu family : 15 model : 3 model name : Intel(R) Pentium(R) 4 CPU 2.80GHz stepping : 3 cpu MHz : 2796.078 cache size : 1024 KB physical id : 0 siblings : 2 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pni monitor ds_cpl cid bogomips : 5586.94 # cat /proc/mdstat Personalities : [raid1] md3 : active raid1 sdb2[1] sda2[0] 52942080 blocks [2/2] [UU] md2 : active raid1 sdb3[1] sda3[0] 15357952 blocks [2/2] [UU] md1 : active raid1 sdb5[1] sda5[0] 10241280 blocks [2/2] [UU] md0 : active raid1 sdb1[1] sda1[0] 104192 blocks [2/2] [UU] unused devices: <none> # dmesg 1] enabled) Processor #1 15:3 APIC version 20 ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) ACPI: IOAPIC (id[0x02] address[0xfec00000] global_irq_base[0x0]) IOAPIC[0]: Assigned apic_id 2 IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge) ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge) Enabling APIC mode: Flat. Using 1 I/O APICs Using ACPI (MADT) for SMP configuration information Built 1 zonelists Kernel command line: ro root=/dev/md1 rhgb quiet mapped 4G/4G trampoline to fffeb000. Initializing CPU#0 CPU 0 irqstacks, hard=023ae000 soft=0238e000 PID hash table entries: 2048 (order 11: 16384 bytes) Detected 2796.078 MHz processor. Using pmtmr for high-res timesource Console: colour VGA+ 80x25 Memory: 514748k/524224k available (1676k kernel code, 8732k reserved, 708k data, 180k init, 0k highmem) Calibrating delay loop... 5537.79 BogoMIPS Security Scaffold v1.0.0 initialized SELinux: Initializing. SELinux: Starting in permissive mode There is already a security framework initialized, register_security failed. Failure registering capabilities with the kernel selinux_register_security: Registering secondary module capability Capability LSM initialized Dentry cache hash table entries: 32768 (order: 5, 131072 bytes) Inode-cache hash table entries: 32768 (order: 5, 131072 bytes) Mount-cache hash table entries: 512 (order: 0, 4096 bytes) CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000 CPU: After vendor identify, caps: bfebfbff 00000000 00000000 00000000 monitor/mwait feature present. using mwait in idle threads. CPU: Trace cache: 12K uops CPU: L2 cache: 1024K CPU: Physical Processor ID: 0 CPU: After all inits, caps: bfebf3ff 00000000 00000000 00000080 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. CPU#0: Intel P4/Xeon Extended MCE MSRs (12) available CPU#0: Thermal monitoring enabled Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Checking 'hlt' instruction... OK. POSIX conformance testing by UNIFIX CPU0: Intel(R) Pentium(R) 4 CPU 2.80GHz stepping 03 per-CPU timeslice cutoff: 2925.21 usecs. task migration cache decay timeout: 3 msecs. enabled ExtINT on CPU#0 ESR value before enabling vector: 00000000 ESR value after enabling vector: 00000000 Booting processor 1/1 eip 2000 CPU 1 irqstacks, hard=023af000 soft=0238f000 Initializing CPU#1 masked ExtINT on CPU#1 ESR value before enabling vector: 00000000 ESR value after enabling vector: 00000000 Calibrating delay loop... 5586.94 BogoMIPS CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000 CPU: After vendor identify, caps: bfebfbff 00000000 00000000 00000000 monitor/mwait feature present. CPU: Trace cache: 12K uops CPU: L2 cache: 1024K CPU: Physical Processor ID: 0 CPU: After all inits, caps: bfebf3ff 00000000 00000000 00000080 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#1. CPU#1: Intel P4/Xeon Extended MCE MSRs (12) available CPU#1: Thermal monitoring enabled CPU1: Intel(R) Pentium(R) 4 CPU 2.80GHz stepping 03 Total of 2 processors activated (11124.73 BogoMIPS). cpu_sibling_map[0] = 1 cpu_sibling_map[1] = 0 ENABLING IO-APIC IRQs init IO_APIC IRQs IO-APIC (apicid-pin) 2-0, 2-16, 2-17, 2-18, 2-19, 2-20, 2-21, 2-22, 2-23 not connected. ..TIMER: vector=0x31 pin1=2 pin2=-1 Using local APIC timer interrupts. calibrating APIC timer ... ..... CPU clock speed is 2794.0867 MHz. ..... host bus clock speed is 199.0633 MHz. checking TSC synchronization across 2 CPUs: passed. Brought up 2 CPUs zapping low mappings. NET: Registered protocol family 16 PCI: PCI BIOS revision 2.10 entry at 0xfb820, last bus=1 PCI: Using configuration type 1 mtrr: v2.0 (20020519) ACPI: Subsystem revision 20040326 ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (00:00) PCI: Probing PCI hardware (bus 00) PCI: Ignoring BAR0-3 of IDE controller 0000:00:1f.2 Transparent bridge - 0000:00:1e.0 ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.HUB0._PRT] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 *7 9 10 11 12 14 15) ACPI: PCI Interrupt Link [LNKB] (IRQs 3 *4 5 7 9 10 11 12 14 15) ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 7 9 10 *11 12 14 15) ACPI: PCI Interrupt Link [LNKD] (IRQs *3 4 5 7 9 10 11 12 14 15) ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 7 9 10 11 12 14 15) ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 7 9 10 11 12 14 15) ACPI: PCI Interrupt Link [LNK0] (IRQs 3 4 5 7 9 *10 11 12 14 15) ACPI: PCI Interrupt Link [LNK1] (IRQs 3 4 *5 7 9 10 11 12 14 15) Linux Plug and Play Support v0.97 (c) Adam Belay usbcore: registered new driver usbfs usbcore: registered new driver hub IOAPIC[0]: Set PCI routing entry (2-16 -> 0xa9 -> IRQ 16 Mode:1 Active:1) 00:00:1f[C] -> 2-16 -> IRQ 169 IOAPIC[0]: Set PCI routing entry (2-18 -> 0xb1 -> IRQ 18 Mode:1 Active:1) 00:00:1f[A] -> 2-18 -> IRQ 177 IOAPIC[0]: Set PCI routing entry (2-17 -> 0xb9 -> IRQ 17 Mode:1 Active:1) 00:00:1f[B] -> 2-17 -> IRQ 185 IOAPIC[0]: Set PCI routing entry (2-19 -> 0xc1 -> IRQ 19 Mode:1 Active:1) 00:00:1d[B] -> 2-19 -> IRQ 193 IOAPIC[0]: Set PCI routing entry (2-23 -> 0xc9 -> IRQ 23 Mode:1 Active:1) 00:00:1d[D] -> 2-23 -> IRQ 201 IOAPIC[0]: Set PCI routing entry (2-21 -> 0xd1 -> IRQ 21 Mode:1 Active:1) 00:01:08[A] -> 2-21 -> IRQ 209 IOAPIC[0]: Set PCI routing entry (2-22 -> 0xd9 -> IRQ 22 Mode:1 Active:1) 00:01:08[B] -> 2-22 -> IRQ 217 IOAPIC[0]: Set PCI routing entry (2-20 -> 0xe1 -> IRQ 20 Mode:1 Active:1) 00:01:08[D] -> 2-20 -> IRQ 225 number of MP IRQ sources: 15. number of IO-APIC #2 registers: 24. testing the IO APIC....................... IO APIC #2...... .... register #00: 02000000 ....... : physical APIC id: 02 ....... : Delivery Type: 0 ....... : LTS : 0 .... register #01: 00178020 ....... : max redirection entries: 0017 ....... : PRQ implemented: 1 ....... : IO APIC version: 0020 .... IRQ redirection table: NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect: 00 000 00 1 0 0 0 0 0 0 00 01 0FF 0F 0 0 0 0 0 1 1 39 02 0FF 0F 0 0 0 0 0 1 1 31 03 0FF 0F 0 0 0 0 0 1 1 41 04 0FF 0F 0 0 0 0 0 1 1 49 05 0FF 0F 0 0 0 0 0 1 1 51 06 0FF 0F 0 0 0 0 0 1 1 59 07 0FF 0F 0 0 0 0 0 1 1 61 08 0FF 0F 0 0 0 0 0 1 1 69 09 0FF 0F 0 1 0 0 0 1 1 71 0a 0FF 0F 0 0 0 0 0 1 1 79 0b 0FF 0F 0 0 0 0 0 1 1 81 0c 0FF 0F 0 0 0 0 0 1 1 89 0d 0FF 0F 0 0 0 0 0 1 1 91 0e 0FF 0F 0 0 0 0 0 1 1 99 0f 0FF 0F 0 0 0 0 0 1 1 A1 10 003 03 1 1 0 1 0 1 1 A9 11 003 03 1 1 0 1 0 1 1 B9 12 003 03 1 1 0 1 0 1 1 B1 13 003 03 1 1 0 1 0 1 1 C1 14 003 03 1 1 0 1 0 1 1 E1 15 003 03 1 1 0 1 0 1 1 D1 16 003 03 1 1 0 1 0 1 1 D9 17 003 03 1 1 0 1 0 1 1 C9 IRQ to pin mappings: IRQ0 -> 0:2 IRQ1 -> 0:1 IRQ3 -> 0:3 IRQ4 -> 0:4 IRQ5 -> 0:5 IRQ6 -> 0:6 IRQ7 -> 0:7 IRQ8 -> 0:8 IRQ9 -> 0:9 IRQ10 -> 0:10 IRQ11 -> 0:11 IRQ12 -> 0:12 IRQ13 -> 0:13 IRQ14 -> 0:14 IRQ15 -> 0:15 IRQ16 -> 0:16 IRQ17 -> 0:17 IRQ18 -> 0:18 IRQ19 -> 0:19 IRQ20 -> 0:20 IRQ21 -> 0:21 IRQ22 -> 0:22 IRQ23 -> 0:23 .................................... done. PCI: Using ACPI for IRQ routing PCI: if you experience problems, try using option 'pci=noacpi' or even 'acpi=off' apm: BIOS version 1.2 Flags 0x07 (Driver version 1.16ac) apm: disabled - APM is not SMP safe. audit: initializing netlink socket (disabled) audit(1092192020.349:0): initialized Total HugeTLB memory allocated, 0 VFS: Disk quotas dquot_6.5.1 Dquot-cache hash table entries: 1024 (order 0, 4096 bytes) SELinux: Registering netfilter hooks Initializing Cryptographic API pci_hotplug: PCI Hot Plug PCI Core version: 0.5 ACPI: Processor [CPU0] (supports C1) ACPI: Processor [CPU1] (supports C1) isapnp: Scanning for PnP cards... isapnp: No Plug & Play device found Real Time Clock Driver v1.12 Linux agpgart interface v0.100 (c) Dave Jones agpgart: Detected an Intel i875 Chipset. agpgart: Maximum main memory to use for agp memory: 439M agpgart: AGP aperture is 256M @ 0xe0000000 Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing enabled RAMDISK driver initialized: 16 RAM disks of 16384K size 1024 blocksize divert: not allocating divert_blk for non-ethernet device lo Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx ide0: I/O resource 0x1F0-0x1F7 not free. ide0: ports already in use, skipping probe hdc: CD-224E, ATAPI CD/DVD-ROM drive Using cfq io scheduler ide1 at 0x170-0x177,0x376 on irq 15 hdc: ATAPI 24X CD-ROM drive, 128kB Cache Uniform CD-ROM driver Revision: 3.20 ide-floppy driver 0.99.newide usbcore: registered new driver hiddev usbcore: registered new driver hid drivers/usb/input/hid-core.c: v2.0:USB HID core driver mice: PS/2 mouse device common for all mice serio: i8042 AUX port at 0x60,0x64 irq 12 input: PS/2 Logitech Mouse on isa0060/serio1 serio: i8042 KBD port at 0x60,0x64 irq 1 input: AT Translated Set 2 keyboard on isa0060/serio0 md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27 NET: Registered protocol family 2 IP: routing cache hash table of 4096 buckets, 32Kbytes TCP: Hash tables configured (established 32768 bind 32768) Initializing IPsec netlink socket NET: Registered protocol family 1 NET: Registered protocol family 17 ACPI: (supports S0 S1 S4 S5) checking if image is initramfs...it isn't (no cpio magic); looks like an initrd Freeing initrd memory: 270k freed md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. RAMDISK: Compressed image found at block 0 VFS: Mounted root (ext2 filesystem). SCSI subsystem initialized libata version 1.02 loaded. ata_piix version 1.02 ata_piix: combined mode detected ata: 0x170 IDE port busy PCI: Setting latency timer of device 0000:00:1f.2 to 64 ata1: SATA max UDMA/133 cmd 0x1F0 ctl 0x3F6 bmdma 0xF000 irq 14 ata1: dev 0 cfg 49:2f00 82:7c69 83:7f09 84:4003 85:7c69 86:3e01 87:4003 88:407f ata1: dev 0 ATA, max UDMA/133, 160086528 sectors (lba48) ata1: dev 1 cfg 49:2f00 82:7c6b 83:7b09 84:4003 85:7c69 86:3a01 87:4003 88:407f ata1: dev 1 ATA, max UDMA/133, 160086528 sectors ata1: dev 0 configured for UDMA/133 ata1: dev 1 configured for UDMA/133 scsi0 : ata_piix Vendor: ATA Model: Maxtor 6Y080M0 Rev: 1.02 Type: Direct-Access ANSI SCSI revision: 05 SCSI device sda: 160086528 512-byte hdwr sectors (81964 MB) SCSI device sda: drive cache: write through sda: sda1 sda2 sda3 sda4 < sda5 sda6 > Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 Vendor: ATA Model: Maxtor 6Y080M0 Rev: 1.02 Type: Direct-Access ANSI SCSI revision: 05 SCSI device sdb: 160086528 512-byte hdwr sectors (81964 MB) SCSI device sdb: drive cache: write through sdb: sdb1 sdb2 sdb3 sdb4 < sdb5 sdb6 > Attached scsi disk sdb at scsi0, channel 0, id 1, lun 0 md: raid1 personality registered as nr 3 md: Autodetecting RAID arrays. md: autorun ... md: considering sdb5 ... md: adding sdb5 ... md: sdb3 has different UUID to sdb5 md: sdb2 has different UUID to sdb5 md: sdb1 has different UUID to sdb5 md: adding sda5 ... md: sda3 has different UUID to sdb5 md: sda2 has different UUID to sdb5 md: sda1 has different UUID to sdb5 md: created md1 md: bind<sda5> md: bind<sdb5> md: running: <sdb5><sda5> raid1: raid set md1 active with 2 out of 2 mirrors md: considering sdb3 ... md: adding sdb3 ... md: sdb2 has different UUID to sdb3 md: sdb1 has different UUID to sdb3 md: adding sda3 ... md: sda2 has different UUID to sdb3 md: sda1 has different UUID to sdb3 md: created md2 md: bind<sda3> md: bind<sdb3> md: running: <sdb3><sda3> raid1: raid set md2 active with 2 out of 2 mirrors md: considering sdb2 ... md: adding sdb2 ... md: sdb1 has different UUID to sdb2 md: adding sda2 ... md: sda1 has different UUID to sdb2 md: created md3 md: bind<sda2> md: bind<sdb2> md: running: <sdb2><sda2> raid1: raid set md3 active with 2 out of 2 mirrors md: considering sdb1 ... md: adding sdb1 ... md: adding sda1 ... md: created md0 md: bind<sda1> md: bind<sdb1> md: running: <sdb1><sda1> raid1: raid set md0 active with 2 out of 2 mirrors md: ... autorun DONE. md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. kjournald starting. Commit interval 5 seconds EXT3-fs: mounted filesystem with ordered data mode. Freeing unused kernel memory: 180k freed SELinux: Disabled at runtime. SELinux: Unregistering netfilter hooks ACPI: Power Button (FF) [PWRF] EXT3 FS on md1, internal journal device-mapper: 4.1.0-ioctl (2003-12-10) initialised: dm@xxxxxxxxxxxxxx hdc: packet command error: status=0x51 { DriveReady SeekComplete Error } hdc: packet command error: error=0x54 cdrom: open failed. Adding 1389580k swap on /dev/sdb6. Priority:-1 extents:1 Adding 1389580k swap on /dev/sda6. Priority:-2 extents:1 hdc: packet command error: status=0x51 { DriveReady SeekComplete Error } hdc: packet command error: error=0x54 cdrom: open failed. kjournald starting. Commit interval 5 seconds EXT3 FS on md3, internal journal EXT3-fs: mounted filesystem with ordered data mode. kjournald starting. Commit interval 5 seconds EXT3 FS on md2, internal journal EXT3-fs: mounted filesystem with ordered data mode. IA-32 Microcode Update Driver: v1.13 <tigran@xxxxxxxxxxx> microcode: No suitable data for cpu 0 microcode: No suitable data for cpu 1 inserting floppy driver for 2.6.5-1.358smp Floppy drive(s): fd0 is 1.44M FDC 0 is a post-1991 82077 Intel(R) PRO/1000 Network Driver - version 5.2.39-k2 Copyright (c) 1999-2004 Intel Corporation. divert: allocating divert_blk for eth0 eth0: Intel(R) PRO/1000 Network Connection divert: allocating divert_blk for eth1 eth1: Intel(R) PRO/1000 Network Connection divert: freeing divert_blk for eth0 divert: freeing divert_blk for eth1 ip_tables: (C) 2000-2002 Netfilter core team Intel(R) PRO/1000 Network Driver - version 5.2.39-k2 Copyright (c) 1999-2004 Intel Corporation. divert: allocating divert_blk for eth0 eth0: Intel(R) PRO/1000 Network Connection divert: allocating divert_blk for eth1 eth1: Intel(R) PRO/1000 Network Connection ip_tables: (C) 2000-2002 Netfilter core team e1000: eth0 NIC Link is Up 100 Mbps Full Duplex NET: Registered protocol family 10 Disabled Privacy Extensions on device 02307a60(lo) IPv6 over IPv4 tunneling driver divert: not allocating divert_blk for non-ethernet device sit0 eth0: no IPv6 routers present