On 2024-10-08 12:38:11 [-0300], André Almeida wrote: > Em 08/10/2024 12:22, Juri Lelli escreveu: > > [...] > > > Now, of course by making the latency sensitive application tasks use a > > higher priority than anything on housekeeping CPUs we could avoid the > > issue, but the fact that an implicit in-kernel link between otherwise > > unrelated tasks might cause priority inversion is probably not ideal? > > Thus this email. > > > > Does this report make any sense? If it does, has this issue ever been > > reported and possibly discussed? I guess it’s kind of a corner case, but > > I wonder if anybody has suggestions already on how to possibly try to > > tackle it from a kernel perspective. > > > > That's right, unrelated apps can share the same futex bucket, causing those > side effects. The bucket is determined by futex_hash() and then tasks get > the hash bucket lock at futex_q_lock(), and none of those functions have > awareness of priorities. almost. Since Juri mentioned PREEMPT_RT the hb locks are aware of priorities. So in his case there was a PI boost, the task with the higher priority can grab the hb lock before others may however since the owner is blocked by the NIC thread, it can't make progress. Lifting the priority over the NIC-thread would bring the owner on the CPU in order to drop the hb lock. > There's this work from Thomas that aims to solve corner cases like this, by > giving apps the option to instead of using the global hash table, to have > their own allocated wait queue: > https://lore.kernel.org/lkml/20160402095108.894519835@xxxxxxxxxxxxx/ > > "Collisions on that hash can lead to performance degradation > and on real-time enabled kernels to unbound priority inversions." This is correct. The problem is also that the hb lock is hashed on several things so if you restart/ reboot you may no longer share the hb lock with the "bad" application. Now that I think about it, of all things we never tried a per-process (shared by threads) hb-lock which could also be hashed. This would avoid blocking on other applications, your would have to blame your own threads. > > Thanks! > > Juri Sebastian