On Wed, Jan 12, 2022 at 7:32 PM Eric Biggers <ebiggers@xxxxxxxxxx> wrote: > How about unrolling the inner loop but not the outer one? Wouldn't that give > most of the benefit, without hurting performance as much? > > If you stay with this approach and don't unroll either loop, can you use 'r' and > 'i' instead of 'i' and 'j', to match the naming in G()? All this might work, sure. But as mentioned earlier, I've abandoned this entirely, as I don't think this patch is necessary. See the v3 patchset instead: https://lore.kernel.org/linux-crypto/20220111220506.742067-1-Jason@xxxxxxxxx/