On Tue, Oct 20, 2020 at 04:39:56PM -0400, Arvind Sankar wrote: > Unrolling the LOAD and BLEND loops improves performance by ~8% on x86_64 > (tested on Broadwell Xeon) while not increasing code size too much. > > Signed-off-by: Arvind Sankar <nivedita@xxxxxxxxxxxx> > --- Looks good, Reviewed-by: Eric Biggers <ebiggers@xxxxxxxxxx>