On Thu, Feb 17, 2022 at 1:48 AM Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> wrote: > > The per-CPU counter are modified with the non-atomic modifier. The > consistency is ensured by disabling interrupts for the update. > On non PREEMPT_RT configuration this works because acquiring a > spinlock_t typed lock with the _irq() suffix disables interrupts. On > PREEMPT_RT configurations the RMW operation can be interrupted. > > Another problem is that mem_cgroup_swapout() expects to be invoked with > disabled interrupts because the caller has to acquire a spinlock_t which > is acquired with disabled interrupts. Since spinlock_t never disables > interrupts on PREEMPT_RT the interrupts are never disabled at this > point. > > The code is never called from in_irq() context on PREEMPT_RT therefore > disabling preemption during the update is sufficient on PREEMPT_RT. > The sections which explicitly disable interrupts can remain on > PREEMPT_RT because the sections remain short and they don't involve > sleeping locks (memcg_check_events() is doing nothing on PREEMPT_RT). > > Disable preemption during update of the per-CPU variables which do not > explicitly disable interrupts. > > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> > Acked-by: Roman Gushchin <guro@xxxxxx> > --- > mm/memcontrol.c | 29 ++++++++++++++++++++++++++++- > 1 file changed, 28 insertions(+), 1 deletion(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 0b5117ed2ae08..36ab3660f2c6d 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -630,6 +630,28 @@ static DEFINE_SPINLOCK(stats_flush_lock); > static DEFINE_PER_CPU(unsigned int, stats_updates); > static atomic_t stats_flush_threshold = ATOMIC_INIT(0); > > +/* > + * Accessors to ensure that preemption is disabled on PREEMPT_RT because it can > + * not rely on this as part of an acquired spinlock_t lock. These functions are > + * never used in hardirq context on PREEMPT_RT and therefore disabling preemtion > + * is sufficient. > + */ > +static void memcg_stats_lock(void) > +{ > +#ifdef CONFIG_PREEMPT_RT > + preempt_disable(); > +#else > + VM_BUG_ON(!irqs_disabled()); > +#endif > +} > + > +static void memcg_stats_unlock(void) > +{ > +#ifdef CONFIG_PREEMPT_RT > + preempt_enable(); > +#endif > +} > + > static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) > { > unsigned int x; > @@ -706,6 +728,7 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, > pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec); > memcg = pn->memcg;) > > + memcg_stats_lock(); The call chains from rmap.c have not really disabled irqs. Actually there is a comment in do_page_add_anon_rmap() "We use the irq-unsafe __{inc|mod}_zone_page_stat because these counters are not modified in interrupt context, and pte lock(a spinlock) is held, which implies preemption disabled". VM_BUG_ON(!irqs_disabled()) within memcg_stats_lock() would be giving false error reports for CONFIG_PREEMPT_NONE kernels.