On Thu, Feb 13, 2020 at 12:42 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > On Wed, Feb 12, 2020 at 08:25:45PM +0800, Yafang Shao wrote: > > On Wed, Feb 12, 2020 at 1:55 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > > Another variant of this problem was recently observed, where the > > > kernel violates cgroups' memory.low protection settings and reclaims > > > page cache way beyond the configured thresholds. It was followed by a > > > proposal of a modified form of the reverted commit above, that > > > implements memory.low-sensitive shrinker skipping over populated > > > inodes on the LRU [1]. However, this proposal continues to run the > > > risk of attracting disproportionate reclaim pressure to a pool of > > > still-used inodes, > > > > Hi Johannes, > > > > If you really think that is a risk, what about bellow additional patch > > to fix this risk ? > > > > diff --git a/fs/inode.c b/fs/inode.c > > index 80dddbc..61862d9 100644 > > --- a/fs/inode.c > > +++ b/fs/inode.c > > @@ -760,7 +760,7 @@ static bool memcg_can_reclaim_inode(struct inode *inode, > > goto out; > > > > cgroup_size = mem_cgroup_size(memcg); > > - if (inode->i_data.nrpages + protection >= cgroup_size) > > + if (inode->i_data.nrpages) > > reclaimable = false; > > > > out: > > > > With this additional patch, we skip all inodes in this memcg until all > > its page cache pages are reclaimed. > > Well that's something we've tried and had to revert because it caused > issues in slab reclaim. See the History part of my changelog. > You misuderstood it. The reverted patch skips all inodes in the system, while this patch only works when you turn on memcg.{min, low} protection. IOW, that is not a default behavior, while it only works when you want it and only effect your targeted memcg rather than the whole system. > > > while not addressing the more generic reclaim > > > inversion problem outside of a very specific cgroup application. > > > > > > > But I have a different understanding. This method works like a > > knob. If you really care about your workingset (data), you should > > turn it on (i.e. by using memcg protection to protect them), while > > if you don't care about your workingset (data) then you'd better > > turn it off. That would be more flexible. Regaring your case in the > > commit log, why not protect your linux git tree with memcg > > protection ? > > I can't imagine a scenario where I *wouldn't* care about my > workingset, though. Why should it be opt-in, not the default? Because the default behavior has caused the XFS performace hit. (I haven't checked your patch carefully, so I don't know whehter your patch fix it yet.) Thanks Yafang