On Sun, Dec 11, 2016 at 7:48 PM, Jason A. Donenfeld <Jason@xxxxxxxxx> wrote: > + switch (left) { > + case 7: b |= ((u64)data[6]) << 48; > + case 6: b |= ((u64)data[5]) << 40; > + case 5: b |= ((u64)data[4]) << 32; > + case 4: b |= ((u64)data[3]) << 24; > + case 3: b |= ((u64)data[2]) << 16; > + case 2: b |= ((u64)data[1]) << 8; > + case 1: b |= ((u64)data[0]); break; > + case 0: break; > + } The above is extremely inefficient. Considering that most kernel data would be expected to be smallish, that matters (ie the usual benchmark would not be about hashing megabytes of data, but instead millions of hashes of small data). I think this could be rewritten (at least for 64-bit architectures) as #ifdef CONFIG_DCACHE_WORD_ACCESS if (left) b |= le64_to_cpu(load_unaligned_zeropad(data) & bytemask_from_count(left)); #else .. do the duff's device thing with the switch() .. #endif which should give you basically perfect code generation (ie a single 64-bit load and a byte mask). Totally untested, just looking at the code and trying to make sense of it. ... and obviously, it requires an actual high-performance use-case to make any difference. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html