Re: FC4 crashes repeatedly on Supermicro AS1020A-T dual-core Opterons, SMP

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Michal Szymanski wrote:

On Fri, May 05, 2006 at 10:18:36AM -0500, Robert M. Hyatt wrote:
One note. I am running on a quad 875 system, but am using Suse rather than FC4. It is running perfectly reliable (this is a 4 cpu, dual-core, 2.2ghz box, 8 processors total). I had problems with FC4 myself, although it runs perfectly on my normal dual xeon boxes...

On Fri, 5 May 2006, Bill Davidsen wrote:

Michal Szymanski wrote:

Hi all,

I have recently purchased three Supermicro AS1020A-T servers equipped
with two dual-core Opterons 280 each. H8DAR-T motherboards, 8 or 12 GB
RAM. The systems carry FC4 x86_64 with proprietary driver (made by
Adaptec) for the onboard Marvell 88SX6041 SATA Controller. Original
(install) kernel 2.6.11-1.1369_FC4smp - unfortunately not upgradable due
to the lack of the SATA driver for other kernel versions.

All systems crash (either hang with some "machine check exception"
kernel messages or reset) when loaded with repeating runs of 1.3gb, CPU
intensive with some I/O. I run 2 or 4 jobs simultaneously and they had
never survived more than a few hours.
...
2. I ran non-SMP 2.6.11 kernel (with Adaptec driver) on another machine.
There have been two test repeating 1.3g jobs running on it (each getting 50%
of the single CPU used by the system) for over 50 hours now, no crashes.
Also, a single test job running on SMP kernel gave no crashes in 24 hours.

What happens if you use only one CPU? Either with a uni kernel (you should have gotten one) or "maxcpus=1" in the boot commands. You are running a custom kernel with custom drivers, so you really should be asking the supplier, all we can do is suggest things which might provide extra information.

Hi all,

I got 3 copies of Roberts' message but none of Bill's :-)

Still, I don't quite understand Bill's question ("What happens if you
use only one CPU?"). The answer is quoted just above this question!
There were no crashes with the system running on non-SMP kernel.

It's a great answer, but not to my question. I wasn't asking what happens with a different kernel, but what happens when you run the SMP kernel and ==>use<== only one CPU by setting the max cpu to one. The uni kernel doesn't have a lot of code in an SMP kernel, so it haides a lot of possible questions.

In the meantime I got Kingston 1GB modules from my dealer, for testing.
Strangely as it seems, I could not crash the machine with Kingston
memory running tests as long as 72 hours. It seems, then, that it is a
memory issue although I do not understand why the same memory crashes
the machine in SMP and does not in non-SMP, under similar load. Also,
the Patriot 2GB memory modules (which seem to crash the machines) are on
the Supermicro's list of memory recommended for H8DAR-T mobo.

One of the machines went back to the dealer (actually to their memory
supplier) for tests. The memory guys seem not to trust our crashing
experience. We'll see what happens. I am afraid, however, that they will
say "the memory is OK".
The memory may be operating within spec, the timing setup in the BIOS may be incorrect, etc, etc. Unfortunately it is possible to get a case where everything is right but it doesn't work. Depending on the BIOS capabilities, adding .05v or .1v to the memory voltage (can you do that?) might solve the problem, or I guess make it worse.

--
bill davidsen <davidsen@xxxxxxx>
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
: send the line "unsubscribe linux-smp" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel]     [Remote Processor]     [Audio]     [Linux for Hams]     [Kernel Newbies]     [Netfilter]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Fedora Users]

  Powered by Linux