Re: Starting an OpenMP parallel section is extremely slow on a hyper-threaded Nehalem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/11/2010 6:35 AM, Edwin Bennink wrote:
Thanks Tim, I thought that the gcc list was the most appropriate one regarding the gomp implementation, but I'll post this question on the gcc-help list.

By the way, Ubuntu 9.10 is the latest version (dd Oct. 2009). HTT works fine for daily use, but massive parallel applications show some odd behaviour: Depending on the structure of the algorithm some pieces of code run significantly faster (about 10%) with HTT enabled, while other pieces of code run slower (some more than 50%). This slowdown happens due to parallel sections inside loops...

Edwin
Sorry for getting confused about Ubuntu version dates.
A requirement for setting GOMP_CPU_AFFINITY for performance with HT is expected. Adjacent threads might be expected to touch some of the same cache lines, so they must be run by sibling logical processors which share the same cache. The only OpenMP library I have seen which makes affinity setting a default is the one from PGI, and that tactic is inflexible. Intel compilers have a seldom used option to set such a default. If you have the libiomp for Intel OpenMP, running with that library in place of libgomp might be an interesting comparison. Among the situations which might make HT run slowly even with appropriate affinity could be cache and TLB capacity shortage, or all hot code sections depending on a shared resource such as FPU, or lack of cache locality (inner loops not stride 1). The Intel MKL library tries to detect HT and (by default) use 1 thread maximum per core.

--
Tim Prince


[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux