On 08/29/2013 09:40 PM, Tim Prince wrote:
On 8/29/2013 2:20 PM, Tobias Burnus wrote:
José Luis García Pallero wrote:
I don't know if this is the correct place for this question, but I
haven't found any mailing list on the GOMP webpage.
The OpenMP 4.0 specifications were launched ten days ago. This new
standard includes several interesting characteristics as SIMD and
accelerators directives and error handling facilities. Is planned to
add this new version of OpenMP to libgomp and, then, to GCC 4.9?
Well, it takes a while until features are implemented - and the
implementation work can only start after a specification/standard is
sufficiently finished to not change in a major way.
Having said that, there is a GCC branch called gomp-4_0-branch (see
http://gcc.gnu.org/svn.html), which is used for the on-going
implementation. I think SIMD already partially works (with C/C++, not
yet with Fortran).
I believe that it is planed to support OpenMP 4 in GCC 4.9.
Tim Prince wrote
OpenMP 4.0 simd facilities are related to Cilk(tm) Plus pragmas, for
which there is a gcc branch on git (although I haven't figured out
that stuff).
As far as I gathered, Cilk+'s pragmas and OpenMP'pragmas are supposed
to be handled identically. (I think there were some differences but
they got resolved by changing Cilk+.) There are some Cilk+ branches,
which aim at consolidating the effort with OpenMP. Actually, some
Cilk+ patches has been submitted for inclusion - thus, expect more
for this. (The submitted patches do not include SIMD as far as I
know. The branches do support it.)
There are distinctions in Intel compilers between Cilk(tm) Plus and
OpenMP 4.0. For example, Cilk(tm) Plus expects use of simd
firstprivate lastprivate where appropriate, while OpenMP 4.0 doesn't
support those clauses, and depends on the compiler recognizing those
cases of omp simd private.
Intel once talked of reconciling terminology (it seems unsatisfactory
to market Fortran directives as Cilk(tm) Plus).
Intel takes Cilk(tm) Plus simd to require in-line simd instructions
rather than automatic replacement by the special memset/memcpy library
function calls, while the corresponding omp simd construct doesn't
inhibit those automatic replacements. I guess gfortran et al. aren't
so likely to introduce these substitutions, so don't need a means to
control them.
For example, I know of no one planning to implement user defined
reduction. Some talk about proposing a specific standard on indexed
min/max before deciding about user defined reductions.
I think the gomp-4_0-branch already supports min/max since quite some
time. (For C/C++; Fortran supports it already since older OpenMP
specs.) Additionally, I believe that Jakub intents to implement
user-defined reductions (UDR) and that he has already done some prep
work on the branch. Ignoring "omp target", UDR seems to be the
biggest new feature.
C omp parallel reduction(min|max: ) was introduced in OpenMP 3.1 but I
didn't find any tests for it in the gcc 4.9 testsuite. Corresponding
omp simd reduction would not be so important for C++ if g++ could
optimize min/max with maxp[sd]/minp[sd] as gfortran and icpc do. No
omp max|min reductions are likely in the Intel icc/icpc 14.0 releases
in a week or so, regardless of claims to support OpenMP 4.0.
(Regarding "omp target" and other accelerator/GPU/hybrid-system
support: I think there is quite some interest to get it working with
GCC, however, it probably will take until 4.10 or longer.)
Among my ulterior motives for asking is my attempt to write a book
centered on HPC development topics.
That sounds interesting!
Tobias
PS: Regarding SIMD, in GCC 4.9 itself, some basic support has already
been merged a few days ago. However, it is not yet accessible from
user code (no front-end support) and I have the impression the
information is not yet used for optimization. But expect soon some
support (possibly something like #pragma simd, #pragma vector for
C/C++ and usage for DO CONCURRENT in Fortran) - but I don't know
which pragma and when the support will be added.
DO CONCURRENT needs a more satisfactory way to invoke omp parallel. A
limited facility (beyond current auto-parallelization) would not
appear in ifort until next year. It seems too difficult to cover all
possibilities. Auto-vectorization works well already (in gfortran,
for example).
I grabbed the gomp-4_0-branch, had to set --disable-werror to build
it. I found 2 cases in netlib vectors benchmark where #pragma omp simd
brings gcc performance up to at least match icc. I suppose it could do
the same for gfortran when the omp simd directives become available.
Tim
--
Tim Prince