On Wed, Sep 14, 2016 at 03:48:44PM -0400, Johannes Weiner wrote: > From: Johannes Weiner <jweiner@xxxxxx> > > During cgroup2 rollout into production, we started encountering css > refcount underflows and css access crashes in the memory controller. > Splitting the heavily shared css reference counter into logical users > narrowed the imbalance down to the cgroup2 socket memory accounting. > > The problem turns out to be the per-cpu charge cache. Cgroup1 had a > separate socket counter, but the new cgroup2 socket accounting goes > through the common charge path that uses a shared per-cpu cache for > all memory that is being tracked. Those caches are safe against > scheduling preemption, but not against interrupts - such as the newly > added packet receive path. When cache draining is interrupted by > network RX taking pages out of the cache, the resuming drain operation > will put references of in-use pages, thus causing the imbalance. > > Disable IRQs during all per-cpu charge cache operations. > > Fixes: f7e1cb6ec51b ("mm: memcontrol: account socket memory in unified hierarchy memory controller") > Cc: <stable@xxxxxxxxxxxxxxx> # 4.5+ > Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx> Acked-by: Vladimir Davydov <vdavydov.dev@xxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html