Brian McGrew wrote: > Top of the morning to y¹all! > > I¹m a bit curious as to what optimization flags are in the newest compilers > and how they¹d work with the newest CPU¹s versus the last generation of > CPU¹s. > > Our older systems we Dell Precision T5400 workstations with dual Intel Xeon > 5420 CPU¹s at 2.33GHz with 6MB of cache per core. The cache breaks out to > be 32k of L1 cache and 6MB of L2 cache. > > Now, we¹re getting Dell Precision T5500 workstations with dual Intel Xeon > 5506 CPU¹s at 2.21GHz with 4MB of cache per core. But, the cache breaks > down as 32k L1 cache, 256k L2 cache and 4096k L3 cache. > > Out application is very processor and disk I/O intensive and it runs about > 6x slower on the newer hardware versus the old. We¹re currently compiling > with gcc-4.1.1 using the following optimization flags on Fedora Core 5 using > a 2.6.16.16 kernel. As it happens, the code runs seemlessly on CentOS 5.2 > with a 2.6.18 kernel as well. Upgrading compilers, if there is a compelling > reason is an option for us. Upgrading kernels, at this time is not an > option because of 3rd party hardware support. > Basically, you seem to be saying that the old scheduler doesn't work well for you, while the newer one is OK. That's not under the control of gcc. You didn't say whether you set the NUMA option in BIOS. If you wish to run with NUMA, so as to get the advantage of local memory access, and want high level affinity control from gcc, you might upgrade to a gcc version which supports libgomp, use OpenMP directives in the important code sections, and set the GOMP_CPU_AFFINITY. Your description of the cache is contradictory. The last level cache on the older CPU is shared between 2 cores, and on the newer one it's shared among 4 cores, unless you have the rare entry level model with only 2 cores. Do you really have such an extremely small mid level cache? That seems like a handicap; you seem to be comparing a top of the line CPU of 2 years ago against a new bottom of the line. I don't know why anyone would choose a dual socket dual core over single socket quad core with the newer model. For the newer CPU model, if auto-vectorization is useful for your application, -mtune=barcelona would be useful. -msse4 would likely be useful on the older one as well. You would need a current gcc for these features, and you may need a more current binutils.