Hi Jason, I was debugging my backport and looking for a bug but then I figured out that everything works as intended. add_interrupt_randomness() has this piece: | if (new_count < 64 && (!time_after(now, fast_pool->last + HZ) || | unlikely(crng_init == 0))) | return; I was reading this wrong the whole time as in (with context): If new_count is greater or equal to 64 and a second passed by schedule the worker for mix_interrupt_randomness(). Which means a worker once a second on a CPU or longer intervals if the CPU is idle. But in reality this is: If new_count is greater _or_ equal to 64 or a at least second passed by then schedule the worker for mix_interrupt_randomness(). This explains why there are some many worker pending/ running for mix_interrupt_randomness() in my testing. To give you an example why this got my attention: | [ 15.763823] random: mix_interrupt_randomness() CPU16, HAZ: 2 | [ 15.763823] random: mix_interrupt_randomness() CPU17, HAZ: 2 | [ 15.763826] random: mix_interrupt_randomness() CPU07, HAZ: 2 | [ 15.763827] random: mix_interrupt_randomness() CPU06, HAZ: 4 | [ 15.763827] random: mix_interrupt_randomness() CPU04, HAZ: 2 | [ 18.579328] random: mix_interrupt_randomness() CPU18, HAZ: 3 | [ 18.579357] random: mix_interrupt_randomness() CPU16, HAZ: 1 | [ 18.579358] random: mix_interrupt_randomness() CPU03, HAZ: 2 | [ 18.579358] random: mix_interrupt_randomness() CPU17, HAZ: 1 | [ 18.579359] random: mix_interrupt_randomness() CPU04, HAZ: 1 | [ 18.579360] random: mix_interrupt_randomness() CPU05, HAZ: 1 | [ 18.579361] random: mix_interrupt_randomness() CPU06, HAZ: 1 | [ 18.579362] random: mix_interrupt_randomness() CPU07, HAZ: 1 | [ 20.531244] random: mix_interrupt_randomness() CPU18, HAZ: 2 | [ 20.531266] random: mix_interrupt_randomness() CPU16, HAZ: 1 | [ 20.531267] random: mix_interrupt_randomness() CPU03, HAZ: 2 | [ 20.531267] random: mix_interrupt_randomness() CPU17, HAZ: 1 | [ 20.531269] random: mix_interrupt_randomness() CPU04, HAZ: 1 | [ 20.531270] random: mix_interrupt_randomness() CPU05, HAZ: 1 | [ 20.531270] random: mix_interrupt_randomness() CPU06, HAZ: 1 | [ 20.531271] random: mix_interrupt_randomness() CPU07, HAZ: 1 | [ 22.515212] random: mix_interrupt_randomness() CPU18, HAZ: 2 | [ 22.515240] random: mix_interrupt_randomness() CPU16, HAZ: 2 | [ 22.515240] random: mix_interrupt_randomness() CPU17, HAZ: 1 | [ 22.515241] random: mix_interrupt_randomness() CPU03, HAZ: 1 | [ 22.515242] random: mix_interrupt_randomness() CPU04, HAZ: 1 | [ 22.515242] random: mix_interrupt_randomness() CPU05, HAZ: 1 | [ 22.515244] random: mix_interrupt_randomness() CPU06, HAZ: 1 | [ 22.515244] random: mix_interrupt_randomness() CPU07, HAZ: 1 | [ 23.948447] random: mix_interrupt_randomness() CPU18, HAZ: 1 | [ 23.948744] random: mix_interrupt_randomness() CPU16, HAZ: 1 | [ 24.531151] random: mix_interrupt_randomness() CPU17, HAZ: 1 | [ 24.531152] random: mix_interrupt_randomness() CPU03, HAZ: 1 | [ 24.531153] random: mix_interrupt_randomness() CPU04, HAZ: 1 | [ 24.531153] random: mix_interrupt_randomness() CPU05, HAZ: 1 | [ 24.531154] random: mix_interrupt_randomness() CPU06, HAZ: 1 | [ 24.531155] random: mix_interrupt_randomness() CPU07, HAZ: 1 | [ 25.034401] random: mix_interrupt_randomness() CPU16, HAZ: 2 | [ 25.074542] random: mix_interrupt_randomness() CPU18, HAZ: 38 | [ 25.566450] random: mix_interrupt_randomness() CPU03, HAZ: 19 | [ 25.598532] random: mix_interrupt_randomness() CPU05, HAZ: 15 | [ 25.726046] random: mix_interrupt_randomness() CPU18, HAZ: 64 | [ 25.802257] random: mix_interrupt_randomness() CPU18, HAZ: 64 | [ 26.085382] random: mix_interrupt_randomness() CPU18, HAZ: 64 This output comes from the hack at the end of the email (not properly formatted). Since the box is idle and runs NO_HZ it is possible that a CPU is idle for a second or longer. If you look at the begin of the output, CPU16 scheduled the worker for mix_interrupt_randomness() with fast_pool::count = 2. So did CPU 17, 7 and 6. At the end log you see CPU18 got busy and scheduled the worker more frequently since it acquired 64 interrupts. This is output includes 10 seconds and CPUs 0 and 1 are not part of the log. Is this really what we want? With HAZ=1 the CPU was woken up from idle poll routine so the registers should have always the same content since it is always the same while() routine calling CPU's idle function. At least get_reg() rotates them but instruction_pointer() would return always the same value. (Side note: on 64bit the upper 32bit of the IP register should be all 0xff…ff for the kernel and 0x00…00 for userland. So this looks like one bit of entropy. I mention this now because I noticed that 32bit throws an additional register to the mix to make up for the small register)). Wouldn't it make sense entropy wise to gather more entropy (say the 64) before consuming it? And waiting at least a second if more entropy was created? With CONFIG_PERIODIC you have at least CONFIG_HZ interrupts on each CPU. "At least" because you have the timer tick interrupt and you may have additional interrupt for your device interrupts. With HZ=1000 you have 1000 timer tick interrupts. That is the one extreme. The other is NO_HZ and an idle box. [ Now that I look around it appears that risc-v has no get_irq_regs() and random_get_entropy() may return 0. Spooky. ] --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1254,6 +1254,7 @@ static unsigned long get_reg(struct fast_pool *f, struct pt_regs *regs) return *ptr; } +enum { MIX_INFLIGHT = 1U << 31 }; static void mix_interrupt_randomness(struct work_struct *work) { struct fast_pool *fast_pool = container_of(work, struct fast_pool, mix); @@ -1271,6 +1272,9 @@ static void mix_interrupt_randomness(struct work_struct *work) * consistent view, before we reenable irqs again. */ memcpy(pool, fast_pool->pool32, sizeof(pool)); + pr_err("%s() CPU%02d, HAZ: %d\n", __func__, smp_processor_id(), + READ_ONCE(fast_pool->count) & ~MIX_INFLIGHT); + WARN_ON((READ_ONCE(fast_pool->count) & MIX_INFLIGHT) == 0); fast_pool->count = 0; fast_pool->last = jiffies; local_irq_enable(); @@ -1288,7 +1292,6 @@ static void mix_interrupt_randomness(struct work_struct *work) void add_interrupt_randomness(int irq) { - enum { MIX_INFLIGHT = 1U << 31 }; cycles_t cycles = random_get_entropy(); unsigned long now = jiffies; struct fast_pool *fast_pool = this_cpu_ptr(&irq_randomness); Sebastian