On 2022-02-09 13:56:44 [+0100], Jason A. Donenfeld wrote: > +static void mix_interrupt_randomness(struct work_struct *work) > +{ > + struct fast_pool *fast_pool = container_of(work, struct fast_pool, mix); > + u8 pool[sizeof(fast_pool->pool)]; So. - CPU1 schedules a worker - CPU1 goes offline before the gets on the CPU. - The worker runs CPU2 - CPU2 is back online - and now CPU1 CPU2 new_count = ++fast_pool->count; reg = fast_pool->count (FAST_POOL_MIX_INFLIGHT | 64) incl reg (FAST_POOL_MIX_INFLIGHT | 65) WRITE_ONCE(fast_pool->count, 0); fast_pool->count = reg ((FAST_POOL_MIX_INFLIGHT | 65) So we lost the WRITE_ONCE(, 0), FAST_POOL_MIX_INFLIGHT is still set and worker is not scheduled. Not easy to trigger, not by an ordinary user. Just wanted to mention… … > @@ -999,9 +1016,10 @@ void add_interrupt_randomness(int irq) > > fast_mix(fast_pool); > add_interrupt_bench(cycles); > + new_count = ++fast_pool->count; > > if (unlikely(crng_init == 0)) { > - if ((fast_pool->count >= 64) && > + if (new_count >= 64 && > crng_fast_load((u8 *)fast_pool->pool, sizeof(fast_pool->pool)) > 0) { crng_fast_load() does spin_trylock_irqsave() in hardirq context. It does not produce any warning on RT but is still wrong IMHO: - lockdep will see a random task and I remember in the past it produced strange lock chains based on this. - Should another task attempt to acquire this lock then it will PI-boost the wrong task. If we just could move this, too. I don't know how timing critical this is but the first backtrace from crng_fast_load() came (to my surprise) from hwrng_fillfn() (a kthread) and added 64bytes in one go. I did move that crng_fast_load() into the worker and did made some numbers: <idle>-0 [000] d..h1.. 2.069924: add_interrupt_randomness: Tick first interrupt … swapper/0-1 [000] d..h.11 2.341938: add_interrupt_randomness: Tick swapper/0-1 [000] d..h.11 2.341938: add_interrupt_randomness: work the 64th interrupt, scheduling the worker. swapper/0-1 [000] d..h.11 2.345937: add_interrupt_randomness: Tick swapper/0-1 [000] d..h111 2.349938: add_interrupt_randomness: Tick swapper/0-1 [000] d..h.11 2.353939: add_interrupt_randomness: Tick swapper/0-1 [000] d..h.11 2.357940: add_interrupt_randomness: Tick swapper/0-1 [000] d..h111 2.361939: add_interrupt_randomness: Tick swapper/0-1 [000] d..h111 2.365939: add_interrupt_randomness: Tick swapper/0-1 [000] d..h.11 2.369941: add_interrupt_randomness: Tick kworker/0:0H-6 [000] ....... 2.384714: mix_interrupt_randomness: load kworker/0:0H-6 [000] ....... 2.384715: crng_fast_load: 16 <idle>-0 [001] dn.h1.. 3.205766: add_interrupt_randomness: Tick <idle>-0 [019] dn.h1.. 6.771047: add_interrupt_randomness: Tick 7 interrupts got lost before the worker could run & load first 16 bytes. The workqueue core gets initialized at that point and spawns first worker. After that the interrupts took a break. And then the work-to-load delay was quite low: <idle>-0 [019] dn.h1.. 7.586234: add_interrupt_randomness: Tick <idle>-0 [019] dn.h1.. 7.586234: add_interrupt_randomness: work kworker/19:0H-175 [019] ....... 7.586504: mix_interrupt_randomness: load kworker/19:0H-175 [019] ....... 7.586507: crng_fast_load: 16 <idle>-0 [020] dn.h1.. 7.614649: add_interrupt_randomness: Tick <idle>-0 [020] dn.h1.. 7.614651: add_interrupt_randomness: work <idle>-0 [020] dn.h1.. 7.614736: add_interrupt_randomness: Tick kworker/20:0H-183 [020] dn.h... 7.614859: add_interrupt_randomness: Tick kworker/20:0H-183 [020] ....... 7.614871: mix_interrupt_randomness: load kworker/20:0H-183 [020] ....... 7.614872: crng_fast_load: 16 <idle>-0 [018] dn.h1.. 8.352423: add_interrupt_randomness: Tick <idle>-0 [018] dn.h1.. 8.352423: add_interrupt_randomness: work kworker/18:0H-167 [018] dn.h1.. 8.352438: add_interrupt_randomness: Tick kworker/18:0H-167 [018] dn.h1.. 8.352448: add_interrupt_randomness: Tick kworker/18:0H-167 [018] dn.h1.. 8.352459: add_interrupt_randomness: Tick kworker/18:0H-167 [018] dn.h1.. 8.352491: add_interrupt_randomness: Tick kworker/18:0H-167 [018] ....... 8.352505: mix_interrupt_randomness: load kworker/18:0H-167 [018] ....... 8.352506: crng_fast_load: 16 In total we lost 13 ticks. I did the same test on PREEMPT_VOLUNTARY and lost 2 ticks only. Sebastian