Re: [PATCH] random: vary jitter iterations based on cycle counter speed

Vladimir Murzin <vladimir.murzin@xxxxxxx> · Wed, 13 Jul 2022 15:31:05 +0100

Hi All,

On 4/22/22 14:20, Jason A. Donenfeld wrote:
> Currently, we do the jitter dance if two consecutive reads to the cycle
> counter return different values. If they do, then we consider the cycle
> counter to be fast enough that one trip through the scheduler will yield
> one "bit" of credited entropy. If those two reads return the same value,
> then we assume the cycle counter is too slow to show meaningful
> differences.
> 
> This methodology is flawed for a variety of reasons, one of which Eric
> posted a patch to fix in [1]. The issue that patch solves is that on a
> system with a slow counter, you might be [un]lucky and read the counter
> _just_ before it changes, so that the second cycle counter you read
> differs from the first, even though there's usually quite a large period
> of time in between the two. For example:
> 
> | real time | cycle counter |
> | --------- | ------------- |
> | 3         | 5             |
> | 4         | 5             |
> | 5         | 5             |
> | 6         | 5             |
> | 7         | 5             | <--- a
> | 8         | 6             | <--- b
> | 9         | 6             | <--- c
> 
> If we read the counter at (a) and compare it to (b), we might be fooled
> into thinking that it's a fast counter, when in reality it is not. The
> solution in [1] is to also compare counter (b) to counter (c), on the
> theory that if the counter is _actually_ slow, and (a)!=(b), then
> certainly (b)==(c).
> 
> This helps solve this particular issue, in one sense, but in another
> sense, it mostly functions to disallow jitter entropy on these systems,
> rather than simply taking more samples in that case.
> 
> Instead, this patch takes a different approach. Right now we assume that
> a difference in one set of consecutive samples means one "bit" of
> credited entropy per scheduler trip. We can extend this so that a
> difference in two sets of consecutive samples means one "bit" of
> credited entropy per /two/ scheduler trips, and three for three, and
> four for four. In other words, we can increase the amount of jitter
> "work" we require for each "bit", depending on how slow the cycle
> counter is.
> 
> So this patch takes whole bunch of samples, sees how many of them are
> different, and divides to find the amount of work required per "bit",
> and also requires that at least some minimum of them are different in
> order to attempt any jitter entropy.
> 
> Note that this approach is still far from perfect. It's not a real
> statistical estimate on how much these samples vary; it's not a
> real-time analysis of the relevant input data. That remains a project
> for another time. However, it does the same (partly flawed) assumptions
> as the code that's there now, so it's probably not worse than the status
> quo, and it handles the issue Eric mentioned in [1]. But, again, it's
> probably a far cry from whatever a really robust version of this would
> be.
> 
> [1] https://lore.kernel.org/lkml/20220421233152.58522-1-ebiggers@xxxxxxxxxx/
>     https://lore.kernel.org/lkml/20220421192939.250680-1-ebiggers@xxxxxxxxxx/
> 
> Cc: Eric Biggers <ebiggers@xxxxxxxxxx>
> Cc: Theodore Ts'o <tytso@xxxxxxx>
> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Jason A. Donenfeld <Jason@xxxxxxxxx>
> ---
> This is an argument very much centered around the somewhat low bar of
> being "not worse than before". If you can think of ways that it doesn't
> even manage to clear that, please do pipe up.
> 
> 
>  drivers/char/random.c | 36 ++++++++++++++++++++++++++----------
>  1 file changed, 26 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index bf89c6f27a19..94a2ddb53662 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -1354,6 +1354,12 @@ void add_interrupt_randomness(int irq)
>  }
>  EXPORT_SYMBOL_GPL(add_interrupt_randomness);
>  
> +struct entropy_timer_state {
> +	unsigned long entropy;
> +	struct timer_list timer;
> +	unsigned int samples, samples_per_bit;
> +};
> +
>  /*
>   * Each time the timer fires, we expect that we got an unpredictable
>   * jump in the cycle counter. Even if the timer is running on another
> @@ -1367,9 +1373,14 @@ EXPORT_SYMBOL_GPL(add_interrupt_randomness);
>   *
>   * So the re-arming always happens in the entropy loop itself.
>   */
> -static void entropy_timer(struct timer_list *t)
> +static void entropy_timer(struct timer_list *timer)
>  {
> -	credit_entropy_bits(1);
> +	struct entropy_timer_state *state = container_of(timer, struct entropy_timer_state, timer);
> +
> +	if (++state->samples == state->samples_per_bit) {
> +		credit_entropy_bits(1);
> +		state->samples = 0;
> +	}
>  }
>  
>  /*
> @@ -1378,17 +1389,22 @@ static void entropy_timer(struct timer_list *t)
>   */
>  static void try_to_generate_entropy(void)
>  {
> -	struct {
> -		unsigned long entropy;
> -		struct timer_list timer;
> -	} stack;
> -
> -	stack.entropy = random_get_entropy();
> +	enum { NUM_TRIAL_SAMPLES = 8192, MAX_SAMPLES_PER_BIT = 256 };
> +	struct entropy_timer_state stack;
> +	unsigned int i, num_different = 0;
> +	unsigned long last = random_get_entropy();
>  
> -	/* Slow counter - or none. Don't even bother */
> -	if (stack.entropy == random_get_entropy())
> +	for (i = 0; i < NUM_TRIAL_SAMPLES - 1; ++i) {
> +		stack.entropy = random_get_entropy();
> +		if (stack.entropy != last)
> +			++num_different;
> +		last = stack.entropy;
> +	}
> +	stack.samples_per_bit = DIV_ROUND_UP(NUM_TRIAL_SAMPLES, num_different + 1);
> +	if (stack.samples_per_bit > MAX_SAMPLES_PER_BIT)
>  		return;
>  
> +	stack.samples = 0;
>  	timer_setup_on_stack(&stack.timer, entropy_timer, 0);
>  	while (!crng_ready() && !signal_pending(current)) {
>  		if (!timer_pending(&stack.timer))

I've just seen on the platform with slow(ish) timer that it is now considered
as a source of entropy with samples_per_bit set to 27 (5.19-rc6 has MAX_SAMPLES_PER_BIT
set to 32). Because of that I see significant delays and I'm trying to understand what
could be wrong with my setup.

I observe one credit_init_bits(1) call (via entropy_timer()) per ~970 schedule() calls.
Is that somewhat expected? Does it make sense at all?

Cheers
Vladimir