Hi André and Sebastian, Thank you so much for your quick replies and for providing context that I was missing! On 08/10/24 12:59, André Almeida wrote: > Em 08/10/2024 12:51, Sebastian Andrzej Siewior escreveu: > > On 2024-10-08 12:38:11 [-0300], André Almeida wrote: > > > Em 08/10/2024 12:22, Juri Lelli escreveu: > > > > > > [...] > > > > > > > Now, of course by making the latency sensitive application tasks use a > > > > higher priority than anything on housekeeping CPUs we could avoid the > > > > issue, but the fact that an implicit in-kernel link between otherwise > > > > unrelated tasks might cause priority inversion is probably not ideal? > > > > Thus this email. > > > > > > > > Does this report make any sense? If it does, has this issue ever been > > > > reported and possibly discussed? I guess it’s kind of a corner case, but > > > > I wonder if anybody has suggestions already on how to possibly try to > > > > tackle it from a kernel perspective. > > > > > > > > > > That's right, unrelated apps can share the same futex bucket, causing those > > > side effects. The bucket is determined by futex_hash() and then tasks get > > > the hash bucket lock at futex_q_lock(), and none of those functions have > > > awareness of priorities. > > > > almost. Since Juri mentioned PREEMPT_RT the hb locks are aware of > > priorities. So in his case there was a PI boost, the task with the > > higher priority can grab the hb lock before others may however since the > > owner is blocked by the NIC thread, it can't make progress. > > Lifting the priority over the NIC-thread would bring the owner on the > > CPU in order to drop the hb lock. > > > > Oh that's right, thanks for pointing it out! > > > > There's this work from Thomas that aims to solve corner cases like this, by > > > giving apps the option to instead of using the global hash table, to have > > > their own allocated wait queue: > > > https://lore.kernel.org/lkml/20160402095108.894519835@xxxxxxxxxxxxx/ > > > > > > "Collisions on that hash can lead to performance degradation > > > and on real-time enabled kernels to unbound priority inversions." > > > > This is correct. The problem is also that the hb lock is hashed on > > several things so if you restart/ reboot you may no longer share the hb > > lock with the "bad" application. > > > > Now that I think about it, of all things we never tried a per-process > > (shared by threads) hb-lock which could also be hashed. This would avoid > > blocking on other applications, your would have to blame your own threads. > > Would this be somewhat similar to what Linus (and Ingo IIUC) were inclined to suggesting from the thread above (edited)? --- So automatically using a local hashtable according to some heuristic is definitely the way to go. And yes, the heuristic may be well be - at least to start - "this is a preempt-RT system" (for people who clearly care about having predictable latencies) or "this is actually a multi-node NUMA system, and I have heaps of memory" --- So, make it per-process local by default on PREEMPT_RT and NUMA? Thanks, Juri