On Fri, Jul 21, 2017 at 09:12:01AM +0200, Oliver Mangold wrote: > Hi, > > I was wondering why reading from /dev/urandom is much slower on > Ryzen than on Intel, and did some analysis. It turns out that the > RDRAND instruction is at fault, which takes much longer on AMD. > > if I read this correctly: > > --- drivers/char/random.c --- > 862 spin_lock_irqsave(&crng->lock, flags); > 863 if (arch_get_random_long(&v)) > 864 crng->state[14] ^= v; > 865 chacha20_block(&crng->state[0], out); > > one call to RDRAND (with 64-bit operand) is issued per computation > of a chacha20 block. According to the measurements I did, it seems > on Ryzen this dominates the time usage: > > On Broadwell E5-2650 v4: > > --- > # dd if=/dev/urandom of=/dev/null bs=1M status=progress > 28827451392 bytes (29 GB) copied, 143.290349 s, 201 MB/s > # perf top > 49.88% [kernel] [k] chacha20_block > 31.22% [kernel] [k] _extract_crng > --- > > On Ryzen 1800X: > > --- > # dd if=/dev/urandom of=/dev/null bs=1M status=progress > 3169845248 bytes (3,2 GB, 3,0 GiB) copied, 42,0106 s, 75,5 MB/s > # perf top > 76,40% [kernel] [k] _extract_crng > 13,05% [kernel] [k] chacha20_block > --- > > An easy improvement might be to replace the usage of > arch_get_random_long() by arch_get_random_int(), as the state array > contains just 32-bit elements, and (contrary to Intel) on Ryzen > 32-bit RDRAND is supposed to be faster by roughly a factor of 2. Nice catch. How much does the performance improve on Ryzen when you use arch_get_random_int()? --Jan > Best regards, > > OM