Hey Florian, Thanks for the feedback. On Fri, Jul 29, 2022 at 10:19:05PM +0200, Florian Weimer wrote: > > + if (getcpu(&start, NULL, NULL) == 0) > > + start %= NUM_BUCKETS; > > getcpu is not available everywhere. Userspace/libc should probably > provide a CPU number hint as an additional argument during the vDSO > call. We can load that easily enough from rseq. That's going to be > faster on x86, too (the LSL instruction is quite slow). The only > advantage of using getcpu like this is that it's compatible with a libc > that isn't rseq-enabled. Actually, the only requirement is that it's somewhat stable and somehow separates threads most of the time. So a per-thread ID or even a per-thread address would work fine too. Adhemerval suggested on IRC this afternoon that there's a thread pointer register value that would be usable for this purpose. I think what I'll do for v2 is abstract this out to a __arch_get_bucket_hint() function, or similar, which the different archs can fill in. > > + for (i = start;;) { > > + struct getrandom_state *state = &buckets[i]; > > + > > + if (cmpxchg(&state->in_use, false, true) == false) > > + return state; > > + > > + i = i == NUM_BUCKETS - 1 ? 0 : i + 1; > > + if (i == start) > > + break; > > + } > > Surely this scales very badly once the number of buckets is smaller than > the system processor count? Right, and there are a few ways that observation can go: 1) It doesn't matter, because who has > 28 threads all churning at once here? Is that something real? 2) The state variable is controllable by userspace, so in theory different ones could be passed. I don't like this idea though - hard to manage and not enough information to do it well. 3) Since we know when this kind of contention is hit, it should be possible to just expand the map size. Seems a bit complicated. 4) Simply allocate a number of pages relative to the number of CPUs, so that this isn't actually a problem. This seems like the simplest approach and seems fine. Jason