Re: [PATCH 4/4] memcg, inode: protect page cache from freeing inode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Dec 17, 2019 at 06:29:19AM -0500, Yafang Shao wrote:
> On my server there're some running MEMCGs protected by memory.{min, low},
> but I found the usage of these MEMCGs abruptly became very small, which
> were far less than the protect limit. It confused me and finally I
> found that was because of inode stealing.
> Once an inode is freed, all its belonging page caches will be dropped as
> well, no matter how may page caches it has. So if we intend to protect the
> page caches in a memcg, we must protect their host (the inode) first.
> Otherwise the memcg protection can be easily bypassed with freeing inode,
> especially if there're big files in this memcg.
> The inherent mismatch between memcg and inode is a trouble. One inode can
> be shared by different MEMCGs, but it is a very rare case. If an inode is
> shared, its belonging page caches may be charged to different MEMCGs.
> Currently there's no perfect solution to fix this kind of issue, but the
> inode majority-writer ownership switching can help it more or less.
> 
> Cc: Roman Gushchin <guro@xxxxxx>
> Cc: Chris Down <chris@xxxxxxxxxxxxxx>
> Cc: Dave Chinner <dchinner@xxxxxxxxxx>
> Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx>
> ---
>  fs/inode.c                 |  9 +++++++++
>  include/linux/memcontrol.h | 15 +++++++++++++++
>  mm/memcontrol.c            | 46 ++++++++++++++++++++++++++++++++++++++++++++++
>  mm/vmscan.c                |  4 ++++
>  4 files changed, 74 insertions(+)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index fef457a..b022447 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -734,6 +734,15 @@ static enum lru_status inode_lru_isolate(struct list_head *item,
>  	if (!spin_trylock(&inode->i_lock))
>  		return LRU_SKIP;
>  
> +
> +	/* Page protection only works in reclaimer */
> +	if (inode->i_data.nrpages && current->reclaim_state) {
> +		if (mem_cgroup_inode_protected(inode)) {
> +			spin_unlock(&inode->i_lock);
> +			return LRU_ROTATE;

Urk, so after having plumbed the memcg all the way down to the
list_lru walk code so that we only walk inodes in that memcg, we now
have to do a lookup from the inode back to the owner memcg to
determine if we should reclaim it? IOWs, I think the layering here
is all wrong - if memcg info is needed in the shrinker, it should
come from the shrink_control->memcg pointer, not be looked up from
the object being isolated...

i.e. this code should read something like this:

	if (memcg && inode->i_data.nrpages &&
	    (!memcg_can_reclaim_inode(memcg, inode)) {
		spin_unlock(&inode->i_lock);
		return LRU_ROTATE;
	}

This code does not need comments because it is obvious what it does,
and it provides a generic hook into inode reclaim for the memcg code
to decide whether the shrinker should reclaim the inode or not.

This is how the memcg code should interact with other shrinkers, too
(e.g. the dentry cache isolation function), so you need to look at
how to make the memcg visible to the lru walker isolation
functions....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux