Hello, I noticed that recent upstream kernels don't account the xarray nodes of the page cache to the allocating cgroup, like we used to do for the radix tree nodes. This results in broken isolation for cgrouped apps, allowing them to escape their containment and harm other cgroups and the system with an excessive build-up of nonresident information. It also breaks thrashing/refault detection because the page cache lives in a different domain than the xarray nodes, and so the shadow shrinker can reclaim nonresident information way too early when there isn't much cache in the root cgroup. This appears to be the culprit: commit a28334862993b5c6a8766f6963ee69048403817c Author: Matthew Wilcox <willy@xxxxxxxxxxxxx> Date: Tue Dec 5 19:04:20 2017 -0500 page cache: Finish XArray conversion With no more radix tree API users left, we can drop the GFP flags and use xa_init() instead of INIT_RADIX_TREE(). Signed-off-by: Matthew Wilcox <willy@xxxxxxxxxxxxx> diff --git a/fs/inode.c b/fs/inode.c index 42f6d25f32a5..9b808986d440 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -349,7 +349,7 @@ EXPORT_SYMBOL(inc_nlink); static void __address_space_init_once(struct address_space *mapping) { - INIT_RADIX_TREE(&mapping->i_pages, GFP_ATOMIC | __GFP_ACCOUNT); + xa_init_flags(&mapping->i_pages, XA_FLAGS_LOCK_IRQ); init_rwsem(&mapping->i_mmap_rwsem); INIT_LIST_HEAD(&mapping->private_list); spin_lock_init(&mapping->private_lock); It fairly blatantly drops __GFP_ACCOUNT. I'm not quite sure how to fix this, since the xarray code doesn't seem to have per-tree gfp flags anymore like the radix tree did. We cannot add SLAB_ACCOUNT to the radix_tree_node_cachep slab cache. And the xarray api doesn't seem to really support gfp flags, either (xas_nomem does, but the optimistic internal allocations have fixed gfp flags).