Hi Thomas, thanks for the in-depth review. On Thu, 2021-09-23 at 00:09 +0200, Thomas Gleixner wrote: > On Wed, Sep 22 2021 at 13:28, Peter Zijlstra wrote: > > On Tue, Sep 21, 2021 at 07:59:51PM +0200, Vlastimil Babka wrote: > > > > > These days the pcplist protection is done by local_lock, which solved > > > the RT concerns. Probably a stupid/infeasible idea, but maybe what you > > > want to achieve could be more generally solved at the local_lock level? > > > That on NOHZ_FULL CPUs, local_locks could have this mode where they > > > could synchronize with remote cpus? > > > > local_lock and spinlock have different rules, local_lock for example can > > never cause an irq inversion, unlike a spinlock. > > TBH, I really regret an attempt I made at some point in the RT > development to abuse local locks for this kind of cross CPU protections > because that led to yet another semantically ill defined construct. > > local locks are as the name says strictly local. IOW, they do not exist > for remote access. Just look at the !RT mapping: > > local_lock() -> preempt_disable() > local_lock_irq() -> local_irq_disable() > ... > > The only thing local_lock is addressing is the opaque nature of > preempt_disable(), local*_disable/save() protected critical sections, > which have per CPU BKL, i.e. undefined protection scope, semantics. > > If you really want cross CPU protection then using a regular spinlock in > a per CPU data structure is the right way to go. > > That makes it a bit akward vs. the code at hand which already introduced > local locks to replace the opaque preempt/local_irq disabled critical > sections with scoped local locks which in turn allows RT to substitute > them with strict CPU local spinlocks. > > But for clarity sake we really have to look at two different cases now: > > 1) Strict per CPU local protection > > That's what the code does today via local lock and this includes > RT where the locality is preserved via the local lock semantics > > I.e. for the !RT case the spinlock overhead is avoided > > 2) Global scoped per CPU protection > > That's what Nicolas is trying to achieve to be able to update > data structures cross CPU for the sake of avoiding smp function > calls or queuing work on remote CPUs for the NOHZ_FULL isolation > use case. > > That said, I completely understand Andrew's concerns versus these > distinctions and their test coverage. > > In consequence the real interesting question here is whether any of > these use cases are sensible to the extra overhead of #2. > > IOW, does it really matter whether the !RT and !NOHZ_FULL case take an > uncontended per CPU spinlock unconditionally instead of just disabling > preemption / interrupts? > > The same question arises vs. the remote work queuing. Does it really > matter? I think that a proper investigation is due to figure out whether > delegated work is really superiour vs. doing the same work locked from a > remote CPU occasionally. > > If you really think about it then the task which is initiating the work > is already in a slow path. So what's the tradeoff to make this path a > little bit slower due to remote access vs. scheduling work and thereby > disturbing the remote CPU which has a performance impact as well and in > the NOHZ_FULL case even a correctness impact? That's especially > questionable for situations where the initiator has to wait for > completion of the remote work. > > The changelogs and the cover letter have a distinct void vs. that which > means this is just another example of 'scratch my itch' changes w/o > proper justification. Point taken, I'll get to it. -- Nicolás Sáenz