Re: [RFC] Storing cgroup id in page->private (Was: Re: [RFC] [PATCH 0/6] Provide cgroup isolation for buffered writes.)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 11, 2011 at 11:52:35AM +0900, KAMEZAWA Hiroyuki wrote:
> On Thu, 10 Mar 2011 21:15:31 -0500
> Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
>  
> > > IMO, if you really need some per-page information, then just put it
> > > in the struct page - you can't hide the memory overhead just by
> > > having the filesystem to store it for you. That just adds
> > > unnecessary complexity...
> > 
> > Ok. I guess adding anything to struct page is going to be hard and 
> > we might have to fall back to looking into using page_cgroup for
> > tracking page state. I was trying to explore the ways so that we don't
> > have to instantiate whole page_cgroup structure just for trying
> > to figure out who dirtied the page.
> > 
> 
> Is this bad ?
> ==

Sounds like an interesting idea. I am primarily concered about the radix
tree node size increase. Not sure how big a concern this is.

Also tracking is useful for two things.

- Proportinal IO
- IO throttling

For proportional IO, anyway we have to use it with memory controller to
control per cgroup dirty ratio so storing info in page_cgroup should
not hurt. 

The only other case where dependence on page_cgroup hurts is IO throttling
where IO controller does not really need memory cgroup controller (I hope
so). But we are still not sure if throttling IO at device level is a
good idea and how to resolve issues related to priority inversion.

But this definitely sounds better than adding a new field in page
struct as I am assuming that it overall is going to consume less memory.

Thanks
Vivek 

> 
> At handling ASYNC I/O in blkio layer, it's unknown that who dirtied the page.
> This lack of information makes impossible to throttole Async I/O per
> cgroup in blkio queue layer.
> 
> This patch records the information into radix-tree and preserve the
> information. There is no 'clear' operation because all I/O starts when
> the page is marked as DIRTY.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
> ---
>  include/linux/radix-tree.h |    3 +++
>  lib/radix-tree.c           |   42 ++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 45 insertions(+)
> 
> Index: mmotm-Mar10/include/linux/radix-tree.h
> ===================================================================
> --- mmotm-Mar10.orig/include/linux/radix-tree.h
> +++ mmotm-Mar10/include/linux/radix-tree.h
> @@ -58,12 +58,14 @@ struct radix_tree_root {
>  	unsigned int		height;
>  	gfp_t			gfp_mask;
>  	struct radix_tree_node	__rcu *rnode;
> +	int			iohint;
>  };
>  
>  #define RADIX_TREE_INIT(mask)	{					\
>  	.height = 0,							\
>  	.gfp_mask = (mask),						\
>  	.rnode = NULL,							\
> +	.iohint = 0,							\
>  }
>  
>  #define RADIX_TREE(name, mask) \
> @@ -74,6 +76,7 @@ do {									\
>  	(root)->height = 0;						\
>  	(root)->gfp_mask = (mask);					\
>  	(root)->rnode = NULL;						\
> +	(root)->iohint = 0;						\
>  } while (0)
>  
>  /**
> Index: mmotm-Mar10/lib/radix-tree.c
> ===================================================================
> --- mmotm-Mar10.orig/lib/radix-tree.c
> +++ mmotm-Mar10/lib/radix-tree.c
> @@ -31,6 +31,7 @@
>  #include <linux/string.h>
>  #include <linux/bitops.h>
>  #include <linux/rcupdate.h>
> +#include <linux/blkdev.h>
>  
>  
>  #ifdef __KERNEL__
> @@ -51,6 +52,9 @@ struct radix_tree_node {
>  	struct rcu_head	rcu_head;
>  	void __rcu	*slots[RADIX_TREE_MAP_SIZE];
>  	unsigned long	tags[RADIX_TREE_MAX_TAGS][RADIX_TREE_TAG_LONGS];
> +#ifdef CONFIG_BLK_CGROUP
> +	unsigned short  iohint[RADIX_TREE_MAP_SIZE];
> +#endif
>  };
>  
>  struct radix_tree_path {
> @@ -473,6 +477,8 @@ void *radix_tree_tag_set(struct radix_tr
>  		offset = (index >> shift) & RADIX_TREE_MAP_MASK;
>  		if (!tag_get(slot, tag, offset))
>  			tag_set(slot, tag, offset);
> +		if (height == 1 && slot && tag == PAGECACHE_TAG_DIRTY)
> +			blkio_record_hint(&slot->iohint[offset]);
>  		slot = slot->slots[offset];
>  		BUG_ON(slot == NULL);
>  		shift -= RADIX_TREE_MAP_SHIFT;
> @@ -1418,3 +1425,38 @@ void __init radix_tree_init(void)
>  	radix_tree_init_maxindex();
>  	hotcpu_notifier(radix_tree_callback, 0);
>  }
> +
> +#ifdef CONFIG_BLK_CGROUP
> +
> +unsigned short radix_tree_lookup_iohint(struct radix_tree_root *root,
> +				int index)
> +{
> +	unsigned int height, shift;
> +	struct radix_tree_node *node;
> +
> +	node = rcu_redereference(root->rnode);
> +	if (node == NULL)
> +		return 0;
> +	if (!radix_tree_is_indirect_ptr(node))
> +		return root->iohint;
> +	node = radxi_tree_indirect_to_ptr(node);
> +
> +	height = node->height;
> +	if (index > radix_tree_maxindex(height))
> +		return 0;
> +	shift = (height - 1) * RADIX_TREE_MAP_SHIFT;
> +	for ( ; ; ) {
> +		int offset;
> +
> +		if (node == NULL)
> +			return 0;
> +		offset = (index >> shift) & RADIX_TREE_MAP_MASK;
> +		if (height == 1)
> +			return node->iohint[offset];
> +		node = rcu_rereference(node->slots[offset]);
> +		shift -= RADIX_TREE_MAP_SHIFT;
> +		height--;
> +	}
> +}
> +EXPORT_SYMBOL(radix_tree_lookup_iohint);
> +#endif
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux