athlon64/opteron 8GB per CPU

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]



> -----Original Message-----
> From: centos-bounces@xxxxxxxxxx 
> [mailto:centos-bounces@xxxxxxxxxx] On Behalf Of Jay Lee
> Sent: Monday, April 10, 2006 1:51 PM
> To: CentOS mailing list; thomas@xxxxxxxxxxx
> Subject: Re:  athlon64/opteron 8GB per CPU
> 
> David Thompson wrote:

   SNIP

> > Folks with deep(er) knowledge of x86_64 architecture, I'd 
> also be interested 
> > in the trade offs between 1CPU w/8GB and 2CPUs each with 
> 4GB.  (E.g. what's 
> > the real cost of shipping data back and forth between the 
> memory spaces of the 
> > two CPUs?  How does the 2.6 kernel deal with the split 
> memory space?  etc...)  
> > The memory is all going to be used up in kernel buffers 
> keeping bits of files 
> > in kernel memory.
> >
> >   
> Here's where the Opteron with onboard memory controllers will kick 
> Xeon's rear.  Also, throwing more CPUs in the mix is not going to
cut 
> down on the amount of memory available, a system with 2-cpus and 8gb

> total ram will always be faster than a system with 1 cpu and 8gb ram

> (except in really strange workloads).
> > Also, trade offs between Opteron and Xeon architectures, 
> i686 vs. x86_64, 
> > bounce buffers, etc., for an application like this would be
helpful.
> >
> >   
> Opteron rules.  

To those interested in Opteron with big memory on Centos 4.3,

Yes Opteron rules by a large margin.

We have completed significant testing with Dual Core 280's on a 2882
and on a 2895 with Centos 4.3 X86_64 - kernel 2.6.9-37.Elsmp and
2.6.12.6 for a linux cluster. 

Each machine had 4GB of DRAM which should present the same problems as
8 GB or 16 GB of RAM. 

These problems can be summed up as:

A) Where is the PCI and I/O space going to be mapped?
B) What about IOMMU?
C) NUMA ? Do I need / want it?
D) How do all of the above interact with I/O Cards?
E) Which I/O cards will be used and how does this effect the drivers /
kernel?


Machines (4) tested:

Tyan 2882 and 2895 MB
Opteron 280 or 285's
4 x 1GB DDR-400 RECC Memory (Kingston or equivalent)
PS = > 400 watts with > 20Amps of +12V
Significant HSF and case cooling


Master node -> 
	
	added a 3ware 7506-4 with 4 x 80 GB IDE disks
	This was added to allow for faster compiles of
	the kernel, the Silverstorm Infiniband source code,
	application code and MPI code.
	(kernel compiles were less than 2.5 minutes)


Slave node -> added 80 GB SATA drive for booting

All machines were loaded with Centos 4.3 X86_64
Complete install.

Problems encountered:

1) The MB BIOS had choices that effected functionality when the 
   Silverstorm and 3ware cards were installed:

	Choices were:

		Software PCI hole remap -> enable or disable
		Hardware PCI hole remap -> enable or disable
		IOMMU -> enable or disable
		ACPI SRAT Table -> enable or disable (enable for NUMA)
		Node Interleave - Must Disable for NUMA to be enabled.

	Wrong choices either ate up some of the below 4GB memory, 
	caused the system to not function properly or cause a panic
	with numa=on.

	Also tried PNP on and off and it only seemed to change which 
	interrupt was used by a given I/O card at boot time but both
	worked.

2) NUMA was very important to us to keep the processes locked to a
given NUMA
   processor / set of cores and a memory node 
	
	-> boot command line addition -> numa=on

   This increased performance by 5 to 20 percent depending on the
application.

	What this does is: At the start up of an application,
processor
	affinity is soft but memory affinity is hard: ie: the
processor
	running a given task can be switched but where that task's 
	memory is assigned stays fixed once the task requests memory. 

	If numa is turned on the scheduler tries to keep a task
located
	on the same memory and CPU node.

3) Compiling the kernel for the K8 as opposed to the generic X86_64
only
   provided a small increase in performance BUT using the 2.6.12.6
kernel
   and compiling it with the latest gcc 4.0.2 compiler and then
compiling
   MPI and the application using the same compiler provided a 15
percent
   application performance increase.

   Seems that gcc 4.0.2 received some significant changes that helped
the K8.

   BTW, 2.6.12.6 worked fine on top of Centos 4.3


Hope this helps answer some of the questions about Opteron / large
memory /
NUMA / system setup.

Seth Bardash

Integrated Solutions and Systems

719-495-5866

Failure can not cope with knowledge and perseverance!

 

-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.385 / Virus Database: 268.4.1/307 - Release Date:
4/10/2006
 


[Index of Archives]     [CentOS]     [CentOS Announce]     [CentOS Development]     [CentOS ARM Devel]     [CentOS Docs]     [CentOS Virtualization]     [Carrier Grade Linux]     [Linux Media]     [Asterisk]     [DCCP]     [Netdev]     [Xorg]     [Linux USB]
  Powered by Linux