On Fri, Oct 13, 2017 at 3:09 PM, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > On Fri, Oct 13, 2017 at 6:56 AM, Andrey Ryabinin > <aryabinin@xxxxxxxxxxxxx> wrote: >> >> This could be fixed by s/vmovdqa/vmovdqu change like bellow, but maybe the right fix >> would be to align the data properly? > > I suspect anything that has the SHA extensions should also do > unaligned loads efficiently. The whole "aligned only" model is broken. > It's just doing two loads from the state pointer, there's likely no > point in trying to align it. +1, good engineering. AVX2 requires 32-byte buffer alignment in some places. It is trickier than this use case because __BIGGEST_ALIGNMENT__ doubled, but a lot of code still assumes 16-bytes. Jeff