Re: [PATCH] random: ensure mix_interrupt_randomness() is consistent

"Jason A. Donenfeld" <Jason@xxxxxxxxx> · Fri, 11 Feb 2022 11:48:15 +0100

Hi Sebastian,

On Fri, Feb 11, 2022 at 9:16 AM Sebastian Andrzej Siewior
<bigeasy@xxxxxxxxxxxxx> wrote:
> But I'm trying to avoid the migrate_disable(), so:
> To close the racy with losing the workqueue bit, wouldn't it be
> sufficient to set it to zero via atomic_cmpxchg()? Also if the counter
> before the memcpy() and after (at cmpxchg time) didn't change then the
> pool wasn't modified. So basically
>
>  do {
>         counter = atomic_read(&fast_pool->count); // no need to cast
>         memcpy(pool, fast_pool->pool_long, ARRAY_SIZE(pool));
>     } while (atomic_cmpxchg(&fast_pool->count, counter, 0) != counter);
>
>
> then it also shouldn't matter if we are _accidentally_ on the wrong CPU.

This won't work. If we're executing on a different CPU, the CPU
mutating the pool won't necessarily update the count at the right
time. This isn't actually a seqlock or something like that. Rather, it
depends on running on the same CPU, where the interrupting irq handler
runs in full before giving control back, so that count and pool are
either both updated or not at all. Making this work across CPUs makes
things a lot more complicated and I'd rather not do that.

Actually, though, a nicer fix would be to just disable local
interrupts for that *2 word copy*. That's a tiny period of time. If
you permit me, that seems nicer. But if you don't like that, I'll keep
that loop.

Unfortunately, though, I think disabling migration is required. Sultan
(CC'd) found that these workqueues can migrate even midway through
running. And generally the whole idea is to keep this on the *same*
CPU so that we don't have to introduce locks and synchronization.

I'll add comments around the acquire/release. The remaining question
I believe is: would you prefer disabing irqs during the 2 word memcpy,
or this counter double read loop?

Jason