Hi, all I am not sure if this is the right place for my question, any replies are appreciated. I am writting OpenMP programs under GCC compiler. And I want to know the details about the overhead of GCC-OpenMP. My concerns are given below. 1) What is the good way to optimize my OpenMP program? There are many aspects that will affect the performance, such as load balancing, locality, scheduling overhead, synchronization, and so on. In which order should I check these aspects. 2) I want to know how to get the load balancing of my application under GCC-OpenMP. How to instrument my application and the OpenMP runtime to extract the load balancing feature? 3) I guess OpenMP will spend some time on scheduling. What runtime APIs should I instrument to get the value of scheduling overhead? 4) Can I measure the time that OpenMP program spend on synchronization, critical, lock and atomic operations? Thanks a lot. Best Regards! Eric Lew Best Regards. Xingjing Lu