On 21 August 2018 at 20:34, Nick Desaulniers <ndesaulniers@xxxxxxxxxx> wrote: > On Tue, Aug 21, 2018 at 11:19 AM Ard Biesheuvel > <ard.biesheuvel@xxxxxxxxxx> wrote: >> >> On 21 August 2018 at 20:04, Nick Desaulniers <ndesaulniers@xxxxxxxxxx> wrote: >> > On Tue, Aug 21, 2018 at 9:46 AM Ard Biesheuvel >> > <ard.biesheuvel@xxxxxxxxxx> wrote: >> >> >> >> Replace the literal load of the addend vector with a sequence that >> >> composes it using immediates. While at it, tweak the code that refers >> >> to it so it does not clobber the register, so we can take the load >> >> out of the loop as well. >> >> >> >> This results in generally better code, but also works around a Clang >> >> issue, whose integrated assembler does not implement the GNU ARM asm >> >> syntax completely, and does not support the =literal notation for >> >> FP registers. >> > >> > Would you mind linking to the issue tracker for: >> > https://bugs.llvm.org/show_bug.cgi?id=38642 >> > >> > And maybe the comment from the binutils source? (or arm32 reference >> > manual you mentioned in https://lkml.org/lkml/2018/8/21/589) >> > https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=gas/testsuite/gas/aarch64/programmer-friendly.s;h=6254c6476efdc848648b05068be0574e7addc85d;hb=HEAD#l11 >> > >> > They can help provide more context to future travelers. >> > >> >> Sure, if it helps. > > Robin linked to the arm documentation and the gas documentation, maybe > those would be better than the source level comment? Or simply the > llvm bug since I've posted those links there, too? > >> To clarify, these are the consecutive values of each of the registers, >> using 16-bit elements: >> >> v7 := { 1, 1, 1, 1, 0, 0, 0, 0 } >> v8 := { 2, 2, 2, 2, 0, 0, 0, 0 } >> v6 := { 3, 0, 3, 0, 3, 0, 3, 0 } >> v8 := { 1, 2, 1, 2, 1, 2, 1, 2 } >> v8 := { 1, 2, 3, 0, 1, 2, 3, 0 } >> v8 := { 1, 0, 2, 0, 3, 0, 0, 0 } > > Beautiful, thank you for this. Can this go in the patch as a comment/ascii art? > Sure, although I realized the following works just as well, and is also 6 instructions. mov x0, #1 mov x1, #2 mov x2, #3 ins v8.s[0], w0 ins v8.s[1], w1 ins v8.d[1], x2 I generally try to stay away from the element accessors if I can, but this is not on a hot path anyway, so there is no need for code that requires comments to understand. > With that... > > Reviewed-by: Nick Desaulniers <ndesaulniers@xxxxxxxxxx> Thanks,