Re: OpenMP 4.0

Tim Prince <n8tm@xxxxxxx> · Sun, 01 Sep 2013 13:37:21 -0400

On 08/29/2013 09:40 PM, Tim Prince wrote:
On 8/29/2013 2:20 PM, Tobias Burnus wrote:
José Luis García Pallero wrote:
I don't know if this is the correct place for this question, but I
haven't found any mailing list on the GOMP webpage.
The OpenMP 4.0 specifications were launched ten days ago. This new
standard includes several interesting characteristics as SIMD and
accelerators directives and error handling facilities. Is planned to
add this new version of OpenMP to libgomp and, then, to GCC 4.9?

Well, it takes a while until features are implemented - and the 
implementation work can only start after a specification/standard is 
sufficiently finished to not change in a major way.

Having said that, there is a GCC branch called gomp-4_0-branch (see
http://gcc.gnu.org/svn.html), which is used for the on-going 
implementation. I think SIMD already partially works (with C/C++, not 
yet with Fortran).

I believe that it is planed to support OpenMP 4 in GCC 4.9.

Tim Prince wrote
OpenMP 4.0 simd facilities are related to Cilk(tm) Plus pragmas, for 
which there is a gcc branch on git (although I haven't figured out 
that stuff).

As far as I gathered, Cilk+'s pragmas and OpenMP'pragmas are supposed 
to be handled identically. (I think there were some differences but 
they got resolved by changing Cilk+.) There are some Cilk+ branches, 
which aim at consolidating the effort with OpenMP. Actually, some 
Cilk+ patches has been submitted for inclusion - thus, expect more 
for this. (The submitted patches do not include SIMD as far as I 
know. The branches do support it.)

There are distinctions in Intel compilers between Cilk(tm) Plus and 
OpenMP 4.0.  For example, Cilk(tm) Plus expects use of simd 
firstprivate lastprivate where appropriate, while OpenMP 4.0 doesn't 
support those clauses, and depends on the compiler recognizing those 
cases of omp simd private.
Intel once talked of reconciling terminology (it seems unsatisfactory 
to market Fortran directives as Cilk(tm) Plus).
Intel takes Cilk(tm) Plus simd to require in-line simd instructions 
rather than automatic replacement by the special memset/memcpy library 
function calls, while the corresponding omp simd construct doesn't 
inhibit those automatic replacements.  I guess gfortran et al. aren't 
so likely to introduce these substitutions, so don't need a means to 
control them.

For example, I know of no one planning to implement user defined 
reduction. Some talk about proposing a specific standard on indexed 
min/max before deciding about user defined reductions.

I think the gomp-4_0-branch already supports min/max since quite some 
time. (For C/C++; Fortran supports it already since older OpenMP 
specs.) Additionally, I believe that Jakub intents to implement 
user-defined reductions (UDR) and that he has already done some prep 
work on the branch. Ignoring "omp target", UDR seems to be the 
biggest new feature.
C omp parallel reduction(min|max: ) was introduced in OpenMP 3.1 but I 
didn't find any tests for it in the gcc 4.9 testsuite. Corresponding 
omp simd reduction would not be so important for C++ if g++ could 
optimize min/max with maxp[sd]/minp[sd] as gfortran and icpc do.  No  
omp max|min reductions are likely in the Intel icc/icpc 14.0 releases 
in a week or so, regardless of claims to support OpenMP 4.0.

(Regarding "omp target" and other accelerator/GPU/hybrid-system 
support: I think there is quite some interest to get it working with 
GCC, however, it probably will take until 4.10 or longer.)

Among my ulterior motives for asking is my attempt to write a book 
centered on HPC development topics.

That sounds interesting!

Tobias

PS: Regarding SIMD, in GCC 4.9 itself, some basic support has already 
been merged a few days ago. However, it is not yet accessible from 
user code (no front-end support) and I have the impression the 
information is not yet used for optimization. But expect soon some 
support (possibly something like #pragma simd, #pragma vector for 
C/C++ and usage for DO CONCURRENT in Fortran) - but I don't know 
which pragma and when the support will be added.
DO CONCURRENT needs a more satisfactory way to invoke omp parallel. A 
limited facility (beyond current auto-parallelization) would not 
appear in ifort until next year.  It seems too difficult to cover all 
possibilities.  Auto-vectorization works well already (in gfortran, 
for example).

I grabbed the gomp-4_0-branch, had to set --disable-werror to build 
it.   I found 2 cases in netlib vectors benchmark where #pragma omp simd 
brings gcc performance up to at least match icc.  I suppose it could do 
the same for gfortran when the omp simd directives become available.
Tim

--
Tim Prince