On 2019-01-16 12:26:51 +0300, Alexander Monakov wrote: > On Tue, 15 Jan 2019, Vincent Lefevre wrote: > > > I would like to know how to get a vector FMA with GCC in a portable > > way. > > > > By "portable way", I mean that the behavior must not depend on the > > compilation options (e.g., if FP contraction is disabled, I still > > want a true FMA) and that the code must not depend on the architecture > > (thus intrinsics should not be used... even when restricting to x86, > > one reason is FMA3 vs FMA4 issues). > > > > For instance, for addition, one can write "a + b". But for FMA? > > In the context of autovectorized code or when using generic vector types? It could be either (or both, see below). But it appears that I need to use vector types due to ABI issues: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65847#c1 (inlining might improve things, but I prefer to avoid ABI issues in every case). But if I use fma() (from either <math.h> or <tgmath.h>), it must be done on scalar types, thus this means that autovectorized code must also work with decomposed vector types. Unfortunately, while this works with structures (which are affected by ABI issues), this doesn't with vectors: on x86_64, I get 2 vfmadd132sd (with unpack instructions) instead of a single vfmadd132pd! I've just reported the following bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88873 > When the source is supposed to be autovectorized and operates on scalar > variables, using fma function works (GCC recognizes it as a builtin; > __FP_FAST_FMA is predefined when the fma instruction is available). > > For generic vector types I'm afraid GCC does not provide such a facility. > I think it would make a reasonable feature request. I've just done it here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88874 -- Vincent Lefèvre <vincent@xxxxxxxxxx> - Web: <https://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)