Hi, this series aims to address the memcg related problem on PREEMPT_RT. I tested them on CONFIG_PREEMPT and CONFIG_PREEMPT_RT with the tools/testing/selftests/cgroup/* tests and I haven't observed any regressions (other than the lockdep report that is already there). Changes since v4: - Added additional counter index to __mod_memcg_lruvec_state() which are updated with enabled interrupts but with disabled interrupts. Also disable these checks on PREEMPT_RT. Reported by Shakeel Butt. - Add additional comment regarding `obj' in drain_obj_stock(). - Disable migration in drain_all_stock() and drain the local stock instead of scheduling a worker. Changes since v3: - Added __memcg_stats_lock() to __mod_memcg_lruvec_state(). This one does not check for disabled interrupts on !RT. The only user (__mod_memcg_lruvec_state()) checks if the context is task (neither soft nor hard irq) if the two idx are used which are used by rmap.c and otherwise it checks for disabled interrupts. Reported by Shakeel Butt. - In drain_all_stock() migration is disabled and drain_local_stock() is invoked directly if the request CPU is the local CPU. v3: https://lore.kernel.org/all/20220217094802.3644569-1-bigeasy@xxxxxxxxxxxxx/ Changes since v2: - rebased on top of v5.17-rc4-mmots-2022-02-15-20-39. - Added memcg_stats_lock() in 3/5 so it a little more obvious and hopefully easiert to maintain. - Opencoded obj_cgroup_uncharge_pages() in drain_obj_stock(). The __locked suffix was confusing. v2: https://lore.kernel.org/all/20220211223537.2175879-1-bigeasy@xxxxxxxxxxxxx/ Changes since v1: - Made a full patch from Michal Hocko's diff to disable the from-IRQ vs from-task optimisation - Disabling threshold event handlers is using now IS_ENABLED(PREEMPT_RT) instead of #ifdef. The outcome is the same but there is no need to shuffle the code around. v1: https://lore.kernel.org/all/20220125164337.2071854-1-bigeasy@xxxxxxxxxxxxx/ Changes since the RFC: - cgroup.event_control / memory.soft_limit_in_bytes is disabled on PREEMPT_RT. It is a deprecated v1 feature. Fixing the signal path is not worth it. - The updates to per-CPU counters are usually synchronised by disabling interrupts. There are a few spots where assumption about disabled interrupts are not true on PREEMPT_RT and therefore preemption is disabled. This is okay since the counter are never written from in_irq() context. RFC: https://lore.kernel.org/all/20211222114111.2206248-1-bigeasy@xxxxxxxxxxxxx/ Sebastian