Re: how to analysis and measure the GCC OpenMP performance and overhead

Tim Prince <n8tm@xxxxxxx> · Fri, 12 Jun 2015 08:24:11 -0400

On 6/11/2015 11:13 PM, xingjing lu wrote:
>   I am writting OpenMP programs under GCC compiler. And I want to know the
> details about the overhead of GCC-OpenMP. My concerns are given below.
> 1) What is the good way to optimize my OpenMP program? There are many
> aspects that will affect the performance, such as load balancing, locality,
> scheduling overhead, synchronization, and so on. In which order should I
> check these aspects.
>
> 2) I want to know how to get the load balancing of my application under
> GCC-OpenMP. How to instrument my application and the OpenMP runtime to
> extract the load balancing feature?
>
> 3) I guess OpenMP will spend some time on scheduling. What runtime APIs
> should I instrument to get the value of scheduling overhead?
>
> 4) Can I measure the time that OpenMP program spend on synchronization,
> critical, lock and atomic operations?
>
I guess my original reply got triggered into something other than plain
text by something here.

If you're looking for a simple way to get some of the facilities of the
Intel icc/VTune combination, remember that the Intel linux OpenMP
library supports gcc OpenMP function calls.

If you are on some other OS, like Windows, a profiler such as VTune or
oprofile will show you where time is spent in OpenMP library wait loops,
so you could attempt to infer answers to some of your questions. 
libgomp for Windows doesn't appear to be built by default with the right
choice of debug symbols to permit capturing source line numbers.

Poor locality could give rise to non-repeatable load imbalance. 
Scheduling overhead is affected by chunk size, and dynamic scheduling
inherently gives non-repeatable results, although it can compensate
somewhat for imbalance.  You may have to experiment on whether explicit
balancing of work among parallel for iterations will do a better job
with your application on your platform.  So it is unlikely that these
performance issues can be dealt with one at a time.

-- 
Tim Prince