On Wed, Dec 18, 2019 at 12:54 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > CCing Dave > > On Tue, Dec 17, 2019 at 08:19:08PM +0800, Yafang Shao wrote: > > On Tue, Dec 17, 2019 at 7:56 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > > What do you mean by this exactly. Are those inodes reclaimed by the > > > regular memory reclaim or by other means? Because shrink_node does > > > exclude shrinking slab for protected memcgs. > > > > By the regular memory reclaim, kswapd, direct reclaimer or memcg reclaimer. > > IOW, the current->reclaim_state it set. > > > > Take an example for you. > > > > kswapd > > balance_pgdat > > shrink_node_memcgs > > switch (mem_cgroup_protected) <<<< memory.current= 1024M > > memory.min = 512M a file has 800M page caches > > case MEMCG_PROT_NONE: <<<< hard limit is not reached. > > beak; > > shrink_lruvec > > shrink_slab <<< it may free the inode and the free all its > > page caches (800M) > > This problem exists independent of cgroup protection. > > The inode shrinker may take down an inode that's still holding a ton > of (potentially active) page cache pages when the inode hasn't been > referenced recently. > > IMO we shouldn't be dropping data that the VM still considers hot > compared to other data, just because the inode object hasn't been used > as recently as other inode objects (e.g. drowned in a stream of > one-off inode accesses). > > I've carried the below patch in my private tree for testing cache > aging decisions that the shrinker interfered with. (It would be nicer > if page cache pages could pin the inode of course, but reclaim cannot > easily participate in the inode refcounting scheme.) > > Thoughts? > I have already though about this solution. But I found there is a similar revert by Dave - see 69056ee6a8a3 ("Revert "mm: don't reclaim inodes with many attached pages""). That's why I CCed Dave in patch-4. So I only fix it for memcg protection because that will not impact too much. > diff --git a/fs/inode.c b/fs/inode.c > index fef457a42882..bfcaaaf6314f 100644 > --- a/fs/inode.c > +++ b/fs/inode.c > @@ -753,7 +753,13 @@ static enum lru_status inode_lru_isolate(struct list_head *item, > return LRU_ROTATE; > } > > - if (inode_has_buffers(inode) || inode->i_data.nrpages) { > + /* Leave the pages to page reclaim */ > + if (inode->i_data.nrpages) { > + spin_unlock(&inode->i_lock); > + return LRU_ROTATE; > + } > + > + if (inode_has_buffers(inode)) { > __iget(inode); > spin_unlock(&inode->i_lock); > spin_unlock(lru_lock);