Hello all (especially the very technical),
I have been experiencing hardware lockups and crashes under Linux
(Fedora Core 5 latest kernel version 2.6.17-1.2174_FC5smp). The crashes
occur under what appears to be very heavy disk access and possibly
multiple concurrent access (i.e. multiple threads).
I experience crashes using Mysql (MySQL-server-4.1.21-0.glibc23) latest
4.1 stable. In this case we also have multiple threads generating a
database of approx 13-30G in size or a period of about 18 hours.
I also have experienced crashes using rsync local_disk to local_disk
copies- this creates multiple threads (unlike a simple copy - cp command
which is a single thread).
The servers are 10 x:
Woodcrest 5160 3Ghz (dual Core+Dual Xeon) (1333 FSB)
Supermicro servers
http://www.supermicro.com/products/system/1U/6015/SYS-6015P-8R.cfm
Motherboard
http://www.supermicro.com/products/motherboard/Xeon1333/5000P/X7DBP-8.cfm
(BIOS 1.1c)
16 GB FB-DIMM RAM 677Mhz - Approved and personally tested by
Supermicro USA
3ware 9550SX-4
4x500GB SATA Seagate Drives/16Mb cache.
HINTS
====
The crashes ONLY happen if we enable all 4 Cores in the BIOS (Dual core
= enabled)
Our tests run 100% perfect if we disable the second core if each Xeon!
(i.e. one core from each Xeon)
My questions
=========
Are there any "known" problems with Dual Core Xeons under load - e.g.
microcode issues ? kernel bugs ?
From the kernel perspective is there any difference in operating code
(i,e, ignoring any superficial stuff like /proc/cpuinfo stuff) for Dual
Core Xeons ?
I assumed that Dual Core would use the exact same code as SMP kernel ?
is this correct ? - I'm told it's not
Are there any special specific patches for Dual Core ? (I did notice in
RH AS 4 a change log that stated something list "improved scheduling for
Dual Core"
Things I've tried
===========
I have tried most combination of BIOS settings e.g. ACPI disabled in
BIOS, kernel parameters acpi=off noacpi noapic etc.. all of which make
no difference - the machines all crash unless I disabled Dual Core ?
I've had extensive contact with Supermicro, 3ware and now Intel - all
of which are blaming each other ?
I've also recompiled the FC5 source RPM with exact same results.
I'm told that AMD had a similar problem with one of their dual cores,
but this was fixed long ago and I assume that fix was specific to AMD
chips and would not apply to Intel due to differences in architecture.
Any suggestions for helping be solve these crash problems would be very
very much appreciated.
Thanks in advance.
Albert.
BIOS Output on boot:
Phoenix TrustedCore(tm) Server
Copyright 1985-2005 Phoenix Technologies Ltd.
All Rights Reserved
Supermicro X7DBP-8/X7DBP-I BIOS Rev 1.1b
CPU = 2 Processors Detected, Cores per Processor = 2
Intel(R) Xeon(R) CPU 5160 @ 3.00GHz
Intel(R) Xeon(R) CPU 5160 @ 3.00GHz
DRAM Type : DDR2-667, FSB at 1333MHz
16384M System RAM Passed
4096 KB L2 Cache
System BIOS shadowed
Video BIOS shadowed
I will post some crash traces from our serial console server as a reply to this message shortly.
--
fedora-list mailing list
fedora-list@xxxxxxxxxx
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list