CCing Dave On Tue, Dec 17, 2019 at 08:19:08PM +0800, Yafang Shao wrote: > On Tue, Dec 17, 2019 at 7:56 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > What do you mean by this exactly. Are those inodes reclaimed by the > > regular memory reclaim or by other means? Because shrink_node does > > exclude shrinking slab for protected memcgs. > > By the regular memory reclaim, kswapd, direct reclaimer or memcg reclaimer. > IOW, the current->reclaim_state it set. > > Take an example for you. > > kswapd > balance_pgdat > shrink_node_memcgs > switch (mem_cgroup_protected) <<<< memory.current= 1024M > memory.min = 512M a file has 800M page caches > case MEMCG_PROT_NONE: <<<< hard limit is not reached. > beak; > shrink_lruvec > shrink_slab <<< it may free the inode and the free all its > page caches (800M) This problem exists independent of cgroup protection. The inode shrinker may take down an inode that's still holding a ton of (potentially active) page cache pages when the inode hasn't been referenced recently. IMO we shouldn't be dropping data that the VM still considers hot compared to other data, just because the inode object hasn't been used as recently as other inode objects (e.g. drowned in a stream of one-off inode accesses). I've carried the below patch in my private tree for testing cache aging decisions that the shrinker interfered with. (It would be nicer if page cache pages could pin the inode of course, but reclaim cannot easily participate in the inode refcounting scheme.) Thoughts? diff --git a/fs/inode.c b/fs/inode.c index fef457a42882..bfcaaaf6314f 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -753,7 +753,13 @@ static enum lru_status inode_lru_isolate(struct list_head *item, return LRU_ROTATE; } - if (inode_has_buffers(inode) || inode->i_data.nrpages) { + /* Leave the pages to page reclaim */ + if (inode->i_data.nrpages) { + spin_unlock(&inode->i_lock); + return LRU_ROTATE; + } + + if (inode_has_buffers(inode)) { __iget(inode); spin_unlock(&inode->i_lock); spin_unlock(lru_lock);