On Tue, 31 Dec 2024 at 03:14, Manfred Spraul <manfred@xxxxxxxxxxxxxxxx> wrote: > > Should we add the missing memory barriers and switch to > wait_queue_active() in front of all wakeup calls? If we *really* want to optimize this, we could even get rid of the memory barrier at least on x86, because (a) mutex_unlock() is a full memory barrier on x86 (it involves a locked cmpxchg) (b) the condition is always set inside the locked region (c) the wakeup is after releasing the lock but this is architecture-specific (ie "mutex_unlock()" is not *guaranteed* to be a memory barrier (ie on other architectures it might be only a release barrier). We have "smp_mb__after_atomic()" and "smp_mb__after_spinlock()", but we don't have a "smp_mb__after_mutex_unlock()". So we'd have to add a new helper or config option. Anyway, I'm perfectly happy to get these optimizations, but because of historical trouble in this area, I want any patches to be very clearly documented. Oleg's patch to only wake up writers when readers have actually opened up a slot may not make any actual difference (because readers in *practice* always do big reads), but I like it because it feels obviously correct and doesn't have any locking or memory ordering subtleties (and actually makes the logic more logical and straightforward). Linus