On Thu, Aug 09, 2018 at 08:38:56PM +0200, Stephan Müller wrote: > The function extract_crng invokes the ChaCha20 block operation directly > on the user-provided buffer. The block operation operates on u32 words. > Thus the extract_crng function expects the buffer to be aligned to u32 > as it is visible with the parameter type of extract_crng. However, > get_random_bytes uses a void pointer which may or may not be aligned. > Thus, an alignment check is necessary and the temporary buffer must be > used if the alignment to u32 is not ensured. I'm wondering whether we have kernel code that actually tries to extract more than 64 bytes, so I'm not sure how often we enter the while loop at all. Out of curiosity, did you find this from code inspection or via a kernel fault on some architecture that doesn't allow unaligned u32 memory accesses? > while (nbytes >= CHACHA20_BLOCK_SIZE) { > - extract_crng(buf); > - buf += CHACHA20_BLOCK_SIZE; > + if (likely((unsigned long)buf & (sizeof(tmp[0]) - 1))) { > + extract_crng(buf); > + buf += CHACHA20_BLOCK_SIZE; > + } else { > + extract_crng(tmp); > + memcpy(buf, tmp, CHACHA20_BLOCK_SIZE); > + } > + That should be if (likely(((unsigned long)buf & (sizeof(tmp[0]) - 1)) == 0) { shouldn't it? If we *did* have callers who were calling get_random_bytes with nbytes significantly larger than 64 bytes (CHACHA20_BLOCK_SIZE), I wonder if we should be doing something like this: while (nbytes >= CHACHA20_BLOCK_SIZE) { int adjust = (unsigned long)buf & (sizeof(tmp[0]) - 1); extract_crng(buf); buf += CHACHA20_BLOCK_SIZE; if (likely(adjust == 0)) { extract_crng(buf); buf += CHACHA20_BLOCK_SIZE; nbytes -= CHACHA20_BLOCK_SIZE; } else { extract_crng(tmp); memcpy(buf, tmp, CHACHA20_BLOCK_SIZE - adjust); buf += CHACHA20_BLOCK_SIZE - adjust; nbytes -= CHACHA20_BLOCK_SIZE - adjust; } } This may be a hyper optimization, though --- it's not clear how often, say the kernel would be calling get_random_bytes with size >> 64 at all, never mind with an unaligned buffer. - Ted