Re: [PATCH v5] random: clear fast pool, crng, and batches in cpuhp bring up

Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> · Thu, 17 Feb 2022 18:44:37 +0100

On 2022-02-17 13:27:29 [+0100], Jason A. Donenfeld wrote:
> Sebastian - this v5 finally follows your suggestion about what
> operations to do at which phase. The only deviation from your exact
> suggestion is that I'm not checking for MIX_INFLIGHT in the online
> handler, and instead just unconditionally zero it out. I think that's an
> acceptable tradeoff to make for simplicity, and it just means we'll
> accumulate even more entropy, which is fine. Hopefully this is an easy
> ack and has no more pitfalls! -Jason

So I think this is the latest, right?

What do you think about this small comment update? :)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 373af789da7ab..3fb14ac1074c5 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -698,10 +698,6 @@ u32 get_random_u32(void)
 EXPORT_SYMBOL(get_random_u32);
 
 #ifdef CONFIG_SMP
-/*
- * This function is called by the cpuhp system, wired up via the large
- * static array in kernel/cpu.c, with the entry CPUHP_RANDOM_PREPARE.
- */
 int random_prepare_cpu(unsigned int cpu)
 {
 	/*
@@ -1238,17 +1234,18 @@ static void fast_mix(u32 pool[4])
 static DEFINE_PER_CPU(struct fast_pool, irq_randomness);
 
 #ifdef CONFIG_SMP
-/*
- * This function is called by the cpuhp system, wired up via the large
- * static array in kernel/cpu.c, with the entry CPUHP_AP_RANDOM_ONLINE.
- */
 int random_online_cpu(unsigned int cpu)
 {
 	/*
-	 * Set irq randomness count to zero so that new accumulated
-	 * irqs are fresh, and more importantly, so that its worker
-	 * is permitted to schedule again when it comes back online,
-	 * since the MIX_INFLIGHT flag will be cleared.
+	 * This function is invoked while the CPU is comming up, after workqueue
+	 * subsystem has been fully initialized for this CPU.
+	 *
+	 * During CPU bring up and shutdown way may schedule a worker
+	 * (and set MIX_INFLIGHT) which will run another CPU because workqueues
+	 * were not yet ready for this CPU. In this case the worker will
+	 * recognize that it runs on the wrong CPU and do nothing.
+	 * Therefore count is set unconditionally to zero which will permit
+	 * to schedule a new worker soon.
 	 */
 	per_cpu_ptr(&irq_randomness, cpu)->count = 0;
 	return 0;


> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -697,6 +697,25 @@ u32 get_random_u32(void)
>  }
>  EXPORT_SYMBOL(get_random_u32);
>  
> +#ifdef CONFIG_SMP
> +/*
> + * This function is called by the cpuhp system, wired up via the large
> + * static array in kernel/cpu.c, with the entry CPUHP_RANDOM_PREPARE.
> + */
> +int random_prepare_cpu(unsigned int cpu)
> +{
> +	/*
> +	 * When the cpu comes back online, immediately invalidate both
> +	 * the per-cpu crng and all batches, so that we serve fresh
> +	 * randomness.
> +	 */
> +	per_cpu_ptr(&crngs, cpu)->generation = ULONG_MAX;
> +	per_cpu_ptr(&batched_entropy_u32, cpu)->position = UINT_MAX;
> +	per_cpu_ptr(&batched_entropy_u64, cpu)->position = UINT_MAX;

This runs before the CPU is up. Could you do the initialisation right
now?
My problem here is that if this (get_random_u32()) is used between
CPUHP_AP_IDLE_DEAD and CPUHP_TEARDOWN_CPU then the initialisation will
happen on the target CPU with disabled interrupts. And will acquire a
sleeping lock (batched_entropy_u32.lock).

You could perform the initialization cross CPU without the lock because
the CPU itself isn't ready yet. Something like
	 batch = per_cpu_ptr(&batched_entropy_u32, cpu);
	 _get_random_bytes(batch->entropy_u32, sizeof(batch->entropy_u32));
	 batch->position = 0;
         batch->generation = next_gen;

should do the job. Plus u64 and I haven't figured out the generation
thingy but I suspect that a sleeping lock is involved.

I did not check if this is a problem on PREEMPT_RT yet but it may become
later one (if we get a user in the hotplug path).

Sebastian