Re: Poor RNG performance on Ryzen

Jan Glauber <jglauber@xxxxxxxxxx> · Fri, 21 Jul 2017 11:26:57 +0200

On Fri, Jul 21, 2017 at 09:12:01AM +0200, Oliver Mangold wrote:
> Hi,
> 
> I was wondering why reading from /dev/urandom is much slower on
> Ryzen than on Intel, and did some analysis. It turns out that the
> RDRAND instruction is at fault, which takes much longer on AMD.
> 
> if I read this correctly:
> 
> --- drivers/char/random.c ---
>     862         spin_lock_irqsave(&crng->lock, flags);
>     863         if (arch_get_random_long(&v))
>     864                 crng->state[14] ^= v;
>     865         chacha20_block(&crng->state[0], out);
> 
> one call to RDRAND (with 64-bit operand) is issued per computation
> of a chacha20 block. According to the measurements I did, it seems
> on Ryzen this dominates the time usage:
> 
> On Broadwell E5-2650 v4:
> 
> ---
> # dd if=/dev/urandom of=/dev/null bs=1M status=progress
> 28827451392 bytes (29 GB) copied, 143.290349 s, 201 MB/s
> # perf top
>   49.88%  [kernel]            [k] chacha20_block
>   31.22%  [kernel]            [k] _extract_crng
> ---
> 
> On Ryzen 1800X:
> 
> ---
> # dd if=/dev/urandom of=/dev/null bs=1M status=progress
> 3169845248 bytes (3,2 GB, 3,0 GiB) copied, 42,0106 s, 75,5 MB/s
> # perf top
>   76,40%  [kernel]                       [k] _extract_crng
>   13,05%  [kernel]                       [k] chacha20_block
> ---
> 
> An easy improvement might be to replace the usage of
> arch_get_random_long() by arch_get_random_int(), as the state array
> contains just 32-bit elements, and (contrary to Intel) on Ryzen
> 32-bit RDRAND is supposed to be faster by roughly a factor of 2.

Nice catch. How much does the performance improve on Ryzen when you
use arch_get_random_int()?

--Jan

> Best regards,
> 
> OM