On Tue, Dec 17, 2019 at 06:29:19AM -0500, Yafang Shao wrote: > On my server there're some running MEMCGs protected by memory.{min, low}, > but I found the usage of these MEMCGs abruptly became very small, which > were far less than the protect limit. It confused me and finally I > found that was because of inode stealing. > Once an inode is freed, all its belonging page caches will be dropped as > well, no matter how may page caches it has. So if we intend to protect the > page caches in a memcg, we must protect their host (the inode) first. > Otherwise the memcg protection can be easily bypassed with freeing inode, > especially if there're big files in this memcg. > The inherent mismatch between memcg and inode is a trouble. One inode can > be shared by different MEMCGs, but it is a very rare case. If an inode is > shared, its belonging page caches may be charged to different MEMCGs. > Currently there's no perfect solution to fix this kind of issue, but the > inode majority-writer ownership switching can help it more or less. > > Cc: Roman Gushchin <guro@xxxxxx> > Cc: Chris Down <chris@xxxxxxxxxxxxxx> > Cc: Dave Chinner <dchinner@xxxxxxxxxx> > Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx> > --- > fs/inode.c | 9 +++++++++ > include/linux/memcontrol.h | 15 +++++++++++++++ > mm/memcontrol.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ > mm/vmscan.c | 4 ++++ > 4 files changed, 74 insertions(+) > > diff --git a/fs/inode.c b/fs/inode.c > index fef457a..b022447 100644 > --- a/fs/inode.c > +++ b/fs/inode.c > @@ -734,6 +734,15 @@ static enum lru_status inode_lru_isolate(struct list_head *item, > if (!spin_trylock(&inode->i_lock)) > return LRU_SKIP; > > + > + /* Page protection only works in reclaimer */ > + if (inode->i_data.nrpages && current->reclaim_state) { > + if (mem_cgroup_inode_protected(inode)) { > + spin_unlock(&inode->i_lock); > + return LRU_ROTATE; Urk, so after having plumbed the memcg all the way down to the list_lru walk code so that we only walk inodes in that memcg, we now have to do a lookup from the inode back to the owner memcg to determine if we should reclaim it? IOWs, I think the layering here is all wrong - if memcg info is needed in the shrinker, it should come from the shrink_control->memcg pointer, not be looked up from the object being isolated... i.e. this code should read something like this: if (memcg && inode->i_data.nrpages && (!memcg_can_reclaim_inode(memcg, inode)) { spin_unlock(&inode->i_lock); return LRU_ROTATE; } This code does not need comments because it is obvious what it does, and it provides a generic hook into inode reclaim for the memcg code to decide whether the shrinker should reclaim the inode or not. This is how the memcg code should interact with other shrinkers, too (e.g. the dentry cache isolation function), so you need to look at how to make the memcg visible to the lru walker isolation functions.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx