On 8/29/2013 2:20 PM, Tobias Burnus wrote:
José Luis García Pallero wrote:
I don't know if this is the correct place for this question, but I
haven't found any mailing list on the GOMP webpage.
The OpenMP 4.0 specifications were launched ten days ago. This new
standard includes several interesting characteristics as SIMD and
accelerators directives and error handling facilities. Is planned to
add this new version of OpenMP to libgomp and, then, to GCC 4.9?
Well, it takes a while until features are implemented - and the
implementation work can only start after a specification/standard is
sufficiently finished to not change in a major way.
Having said that, there is a GCC branch called gomp-4_0-branch (see
http://gcc.gnu.org/svn.html), which is used for the on-going
implementation. I think SIMD already partially works (with C/C++, not
yet with Fortran).
I believe that it is planed to support OpenMP 4 in GCC 4.9.
Tim Prince wrote
OpenMP 4.0 simd facilities are related to Cilk(tm) Plus pragmas, for
which there is a gcc branch on git (although I haven't figured out
that stuff).
As far as I gathered, Cilk+'s pragmas and OpenMP'pragmas are supposed
to be handled identically. (I think there were some differences but
they got resolved by changing Cilk+.) There are some Cilk+ branches,
which aim at consolidating the effort with OpenMP. Actually, some
Cilk+ patches has been submitted for inclusion - thus, expect more for
this. (The submitted patches do not include SIMD as far as I know. The
branches do support it.)
There are distinctions in Intel compilers between Cilk(tm) Plus and
OpenMP 4.0. For example, Cilk(tm) Plus expects use of simd firstprivate
lastprivate where appropriate, while OpenMP 4.0 doesn't support those
clauses, and depends on the compiler recognizing those cases of omp simd
private.
Intel once talked of reconciling terminology (it seems unsatisfactory to
market Fortran directives as Cilk(tm) Plus).
Intel takes Cilk(tm) Plus simd to require in-line simd instructions
rather than automatic replacement by the special memset/memcpy library
function calls, while the corresponding omp simd construct doesn't
inhibit those automatic replacements. I guess gfortran et al. aren't so
likely to introduce these substitutions, so don't need a means to
control them.
For example, I know of no one planning to implement user defined
reduction. Some talk about proposing a specific standard on indexed
min/max before deciding about user defined reductions.
I think the gomp-4_0-branch already supports min/max since quite some
time. (For C/C++; Fortran supports it already since older OpenMP
specs.) Additionally, I believe that Jakub intents to implement
user-defined reductions (UDR) and that he has already done some prep
work on the branch. Ignoring "omp target", UDR seems to be the biggest
new feature.
C omp parallel reduction(min|max: ) was introduced in OpenMP 3.1 but I
didn't find any tests for it in the gcc 4.9 testsuite. Corresponding omp
simd reduction would not be so important for C++ if g++ could optimize
min/max with maxp[sd]/minp[sd] as gfortran and icpc do. No omp max|min
reductions are likely in the Intel icc/icpc 14.0 releases in a week or
so, regardless of claims to support OpenMP 4.0.
(Regarding "omp target" and other accelerator/GPU/hybrid-system
support: I think there is quite some interest to get it working with
GCC, however, it probably will take until 4.10 or longer.)
Among my ulterior motives for asking is my attempt to write a book
centered on HPC development topics.
That sounds interesting!
Tobias
PS: Regarding SIMD, in GCC 4.9 itself, some basic support has already
been merged a few days ago. However, it is not yet accessible from
user code (no front-end support) and I have the impression the
information is not yet used for optimization. But expect soon some
support (possibly something like #pragma simd, #pragma vector for
C/C++ and usage for DO CONCURRENT in Fortran) - but I don't know which
pragma and when the support will be added.
DO CONCURRENT needs a more satisfactory way to invoke omp parallel. A
limited facility (beyond current auto-parallelization) would not appear
in ifort until next year. It seems too difficult to cover all
possibilities. Auto-vectorization works well already (in gfortran, for
example).
--
Tim Prince