"Jason A. Donenfeld" <Jason@xxxxxxxxx> writes: > Hi Toke, > > On Tue, Dec 6, 2022 at 2:26 PM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: >> So for instance, if there's a large fixed component of the overhead of >> get_random_u32(), we could have bpf_user_rnd_u32() populate a larger >> per-CPU buffer and then just emit u32 chunks of that as long as we're >> still in the same NAPI loop as the first call. Or something to that >> effect. Not sure if this makes sense for this use case, but figured I'd >> throw the idea out there :) > > Actually, this already is how get_random_u32() works! It buffers a > bunch of u32s in percpu batches, and doles them out as requested. Ah, right. Not terribly surprised you already did this! > However, this API currently works in all contexts, including in > interrupts. So each call results in disabling irqs and reenabling > them. If I bifurcated batches into irq batches and non-irq batches, so > that we only needed to disable preemption for the non-irq batches, > that'd probably improve things quite a bit, since then the overhead > really would reduce to just a memcpy for the majority of calls. But I > don't know if adding that duplication of all code paths is really > worth the huge hassle. Right, makes sense; happy to leave that decision entirely up to you :) -Toke