On Tue 25-01-22 17:43:36, Sebastian Andrzej Siewior wrote: > The members of the per-CPU structure memcg_stock_pcp are protected > either by disabling interrupts or by disabling preemption if the > invocation occurred in process context. > Disabling interrupts protects most of the structure excluding task_obj > while disabling preemption protects only task_obj. > This schema is incompatible with PREEMPT_RT because it creates atomic > context in which actions are performed which require preemptible > context. One example is obj_cgroup_release(). > > The IRQ-disable and preempt-disable sections can be replaced with > local_lock_t which preserves the explicit disabling of interrupts while > keeps the code preemptible on PREEMPT_RT. > > The task_obj has been added for performance reason on non-preemptible > kernels where preempt_disable() is a NOP. On the PREEMPT_RT preemption > model preempt_disable() is always implemented. Also there are no memory > allocations in_irq() context and softirqs are processed in (preemptible) > process context. Therefore it makes sense to avoid using task_obj. > > Don't use task_obj on PREEMPT_RT and replace manual disabling of > interrupts with a local_lock_t. This change requires some factoring: > > - drain_obj_stock() drops a reference on obj_cgroup which leads to an > invocation of obj_cgroup_release() if it is the last object. This in > turn leads to recursive locking of the local_lock_t. To avoid this, > obj_cgroup_release() is invoked outside of the locked section. > > - drain_obj_stock() gets a memcg_stock_pcp passed if the stock_lock has been > acquired (instead of the task_obj_lock) to avoid recursive locking later > in refill_stock(). > > - drain_all_stock() disables preemption via get_cpu() and then invokes > drain_local_stock() if it is the local CPU to avoid scheduling a worker > (which invokes the same function). Disabling preemption here is > problematic due to the sleeping locks in drain_local_stock(). > This can be avoided by always scheduling a worker, even for the local > CPU. Using cpus_read_lock() stabilizes cpu_online_mask which ensures > that no worker is scheduled for an offline CPU. Since there is no > flush_work(), it is still possible that a worker is invoked on the wrong > CPU but it is okay since it operates always on the local-CPU data. > > - drain_local_stock() is always invoked as a worker so it can be optimized > by removing in_task() (it is always true) and avoiding the "irq_save" > variant because interrupts are always enabled here. Operating on > task_obj first allows to acquire the lock_lock_t without lockdep > complains. > > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> I do not see any obvious problem with this patch. The code is ugly as hell, though, but a large part of that is because of the weird locking scheme we already have. I've had a look at 559271146efc ("mm/memcg: optimize user context object stock access") and while I agree that it makes sense to optimize for user context I do not really see any numbers justifying the awkward locking scheme. Is this complexity really worth it? -- Michal Hocko SUSE Labs