Hi,
I was wondering why reading from /dev/urandom is much slower on Ryzen
than on Intel, and did some analysis. It turns out that the RDRAND
instruction is at fault, which takes much longer on AMD.
if I read this correctly:
--- drivers/char/random.c ---
862 spin_lock_irqsave(&crng->lock, flags);
863 if (arch_get_random_long(&v))
864 crng->state[14] ^= v;
865 chacha20_block(&crng->state[0], out);
one call to RDRAND (with 64-bit operand) is issued per computation of a
chacha20 block. According to the measurements I did, it seems on Ryzen
this dominates the time usage:
On Broadwell E5-2650 v4:
---
# dd if=/dev/urandom of=/dev/null bs=1M status=progress
28827451392 bytes (29 GB) copied, 143.290349 s, 201 MB/s
# perf top
49.88% [kernel] [k] chacha20_block
31.22% [kernel] [k] _extract_crng
---
On Ryzen 1800X:
---
# dd if=/dev/urandom of=/dev/null bs=1M status=progress
3169845248 bytes (3,2 GB, 3,0 GiB) copied, 42,0106 s, 75,5 MB/s
# perf top
76,40% [kernel] [k] _extract_crng
13,05% [kernel] [k] chacha20_block
---
An easy improvement might be to replace the usage of
arch_get_random_long() by arch_get_random_int(), as the state array
contains just 32-bit elements, and (contrary to Intel) on Ryzen 32-bit
RDRAND is supposed to be faster by roughly a factor of 2.
Best regards,
OM