portable performance engineering (was: Fedora 32 System-Wide Change proposal: x86-64 micro-architecture update)

Dave Love <loveshack@xxxxxxxxxxxxxxxxx> · Tue, 23 Jul 2019 17:00:58 +0100

I'm afraid this turned into a bit of and essay on more useful things
Fedora could do for portable performance engineering, should anyone
care.

I actually have no interest in Fedora except as a requirement to work on
packaging for research software around EPEL, specifically for HPC and so
performance-oriented.  I'm not sure how long it's worth persisting in
view of how difficult it is to contribute now, but these points are
mostly general.

The x86 change is clearly a non-starter, and I'm surprised to see where
it came from, but I don't see anyone mentioning much on rationale
higher-level aspects, apart from some better things to do.  Strikingly,
there's no quantification of expected performance benefits.  Anyway,
they'd be rather limited by the compiler options we're supposed to use,
that don't include vectorization, so you don't even get the benefit you
could from SSE2.  (I've been told off in review for turning that on,
though an FPC member has approved it.)

I've seen lack of support for things that would help, and plenty of
comments from people who clearly haven't work in this area.  At least
one sensible portable performance-oriented change has been blocked in
committee (interchangeable BLAS implementations), and what I've seen
from Red Hat and Fedora makes it increasingly hard to justify a RHEL-ish
basis for HPC.  However, things could be done for computational
performance where it matters.

SIMD hwcaps have been mentioned, and I'm baffled why they haven't been
implemented generally.  That and similar changes are actually more
important for non-x86 architectures less likely to have
dynamically-dispatched SIMD-specific implementations.  [The value of
"SIMD" includes things like FMA, and I know not just for floating point.]

However, hwcaps won't help for programs with no separate library
performance component; Gromacs is an example.  On a heterogeneous HPC
system you need multiple parallel-installable versions with a convention
for the paths they'll be on.  Other than that, maintainers could look at
function multi-versioning for performance-critical code where that's
possible.  It isn't always, specifically not for Fortran, and I'd
probably look at that first if I got back to GCC maintenance.
(Actually, adding FMV to BLIS wasn't effective for some reason I haven't
had time to chase.)

The "Clear Linux" stuff mentioned is unconvincing.  The only worked
example I've seen is for FFTW, where it actually has no effect, and I've
seen no numbers.  Using FMV outside performance-critical kernels -- just
something GCC says is vectorizable -- is probably not a good idea, and
any changes ought to be contributed explicitly for the source and
support non-x86.  (By the way, don't even Intel assume AVX as a
baseline, not AVX2?)

There's already multi-simd support for ATLAS -- though I know no good
reason to ATLAS -- and at least one package (libxsmm) has a minimum
requirement of SSE3 without complaint.  (I got that down from SSE4 for
the benefit of systems we had, though you wouldn't use them for anything
CPU-bound.)
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx