Hi Ard, On Mon, Oct 08, 2018 at 11:15:53PM +0200, Ard Biesheuvel wrote: > On ARM v6 and later, we define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS > because the ordinary load/store instructions (ldr, ldrh, ldrb) can > tolerate any misalignment of the memory address. However, load/store > double and load/store multiple instructions (ldrd, ldm) may still only > be used on memory addresses that are 32-bit aligned, and so we have to > use the CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS macro with care, or we > may end up with a severe performance hit due to alignment traps that > require fixups by the kernel. > > Fortunately, the get_unaligned() accessors do the right thing: when > building for ARMv6 or later, the compiler will emit unaligned accesses > using the ordinary load/store instructions (but avoid the ones that > require 32-bit alignment). When building for older ARM, those accessors > will emit the appropriate sequence of ldrb/mov/orr instructions. And on > architectures that can truly tolerate any kind of misalignment, the > get_unaligned() accessors resolve to the leXX_to_cpup accessors that > operate on aligned addresses. > > So switch to the unaligned accessors for the aligned fast path. This > will create the exact same code on architectures that can really > tolerate any kind of misalignment, and generate code for ARMv6+ that > avoids load/store instructions that trigger alignment faults. > > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> > --- > crypto/algapi.c | 7 +++---- > include/crypto/algapi.h | 11 +++++++++-- > 2 files changed, 12 insertions(+), 6 deletions(-) > > diff --git a/crypto/algapi.c b/crypto/algapi.c > index 2545c5f89c4c..52ce3c5a0499 100644 > --- a/crypto/algapi.c > +++ b/crypto/algapi.c > @@ -988,11 +988,10 @@ void crypto_inc(u8 *a, unsigned int size) > __be32 *b = (__be32 *)(a + size); > u32 c; > > - if (IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) || > - IS_ALIGNED((unsigned long)b, __alignof__(*b))) > + if (IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)) > for (; size >= 4; size -= 4) { > - c = be32_to_cpu(*--b) + 1; > - *b = cpu_to_be32(c); > + c = get_unaligned_be32(--b) + 1; > + put_unaligned_be32(c, b); > if (likely(c)) > return; > } > diff --git a/include/crypto/algapi.h b/include/crypto/algapi.h > index 4a5ad10e75f0..86267c232f34 100644 > --- a/include/crypto/algapi.h > +++ b/include/crypto/algapi.h > @@ -17,6 +17,8 @@ > #include <linux/kernel.h> > #include <linux/skbuff.h> > > +#include <asm/unaligned.h> > + > /* > * Maximum values for blocksize and alignmask, used to allocate > * static buffers that are big enough for any combination of > @@ -212,7 +214,9 @@ static inline void crypto_xor(u8 *dst, const u8 *src, unsigned int size) > unsigned long *s = (unsigned long *)src; > > while (size > 0) { > - *d++ ^= *s++; > + put_unaligned(get_unaligned(d) ^ get_unaligned(s), d); > + d++; > + s++; > size -= sizeof(unsigned long); > } > } else { > @@ -231,7 +235,10 @@ static inline void crypto_xor_cpy(u8 *dst, const u8 *src1, const u8 *src2, > unsigned long *s2 = (unsigned long *)src2; > > while (size > 0) { > - *d++ = *s1++ ^ *s2++; > + put_unaligned(get_unaligned(s1) ^ get_unaligned(s2), d); > + d++; > + s1++; > + s2++; > size -= sizeof(unsigned long); > } > } else { > -- > 2.11.0 > Doesn't __crypto_xor() have the same problem too? - Eric