Re: [PATCH] radix-tree: account nodes to memcg only if explicitly requested

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon 01-08-16 16:13:08, Vladimir Davydov wrote:
> Radix trees may be used not only for storing page cache pages, so
> unconditionally accounting radix tree nodes to the current memory cgroup
> is bad: if a radix tree node is used for storing data shared among
> different cgroups we risk pinning dead memory cgroups forever. So let's
> only account radix tree nodes if it was explicitly requested by passing
> __GFP_ACCOUNT to INIT_RADIX_TREE. Currently, we only want to account
> page cache entries, so mark mapping->page_tree so.
> 
> Signed-off-by: Vladimir Davydov <vdavydov@xxxxxxxxxxxxx>

OK, the patch makes sense to me. Such a false sharing would be really
tedious to debug

Do we want to mark it for stable 4.6 to prevent from some pathological
issues. The patch is simple enough.

Acked-by: Michal Hocko <mhocko@xxxxxxxx>

> ---
>  fs/inode.c       |  2 +-
>  lib/radix-tree.c | 14 ++++++++++----
>  2 files changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index 559a9da25237..1d04dab5211c 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -345,7 +345,7 @@ EXPORT_SYMBOL(inc_nlink);
>  void address_space_init_once(struct address_space *mapping)
>  {
>  	memset(mapping, 0, sizeof(*mapping));
> -	INIT_RADIX_TREE(&mapping->page_tree, GFP_ATOMIC);
> +	INIT_RADIX_TREE(&mapping->page_tree, GFP_ATOMIC | __GFP_ACCOUNT);
>  	spin_lock_init(&mapping->tree_lock);
>  	init_rwsem(&mapping->i_mmap_rwsem);
>  	INIT_LIST_HEAD(&mapping->private_list);
> diff --git a/lib/radix-tree.c b/lib/radix-tree.c
> index 61b8fb529cef..1b7bf7314141 100644
> --- a/lib/radix-tree.c
> +++ b/lib/radix-tree.c
> @@ -277,10 +277,11 @@ radix_tree_node_alloc(struct radix_tree_root *root)
>  
>  		/*
>  		 * Even if the caller has preloaded, try to allocate from the
> -		 * cache first for the new node to get accounted.
> +		 * cache first for the new node to get accounted to the memory
> +		 * cgroup.
>  		 */
>  		ret = kmem_cache_alloc(radix_tree_node_cachep,
> -				       gfp_mask | __GFP_ACCOUNT | __GFP_NOWARN);
> +				       gfp_mask | __GFP_NOWARN);
>  		if (ret)
>  			goto out;
>  
> @@ -303,8 +304,7 @@ radix_tree_node_alloc(struct radix_tree_root *root)
>  		kmemleak_update_trace(ret);
>  		goto out;
>  	}
> -	ret = kmem_cache_alloc(radix_tree_node_cachep,
> -			       gfp_mask | __GFP_ACCOUNT);
> +	ret = kmem_cache_alloc(radix_tree_node_cachep, gfp_mask);
>  out:
>  	BUG_ON(radix_tree_is_internal_node(ret));
>  	return ret;
> @@ -351,6 +351,12 @@ static int __radix_tree_preload(gfp_t gfp_mask, int nr)
>  	struct radix_tree_node *node;
>  	int ret = -ENOMEM;
>  
> +	/*
> +	 * Nodes preloaded by one cgroup can be be used by another cgroup, so
> +	 * they should never be accounted to any particular memory cgroup.
> +	 */
> +	gfp_mask &= ~__GFP_ACCOUNT;
> +
>  	preempt_disable();
>  	rtp = this_cpu_ptr(&radix_tree_preloads);
>  	while (rtp->nr < nr) {
> -- 
> 2.1.4
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]