Re: [PATCH v2 1/2] bcache: introduce per-process ioprio-based bypass/writeback hints

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 12 Oct 2016, Kent Overstreet wrote:
> On Tue, Oct 11, 2016 at 12:04:52PM -0700, Eric Wheeler wrote:
> > Add support to bcache hinting functions and sysfs to hint by the ioprio of
> > 'current' for cache writeback or bypass (configured per-process with `ionice`).
> > 
> > Having idle IOs bypass the cache can increase performance elsewhere
> > since you probably don't care about their performance.  In addition,
> > this prevents idle IOs from promoting into (polluting) your cache and
> > evicting blocks that are more important elsewhere.
> > 
> > If you really nead the performance at the expense of SSD wearout,
> > then configure ioprio_writeback and set your `ionice` appropriately.
> > 
> > For example:
> > 	echo 2,7 > /sys/block/bcache0/bcache/ioprio_bypass
> > 	echo 2,0 > /sys/block/bcache0/bcache/ioprio_writeback
> > 
> > See the documentation commit for details.
> > 
> > Signed-off-by: Eric Wheeler <bcache@xxxxxxxxxxxxxxxxxx>
> > Tested-by: Kai Krakow <kai@xxxxxxxxxxx>
> > ---
> >  drivers/md/bcache/bcache.h    |  3 ++
> >  drivers/md/bcache/request.c   | 14 +++++++++
> >  drivers/md/bcache/sysfs.c     | 71 
+++++++++++++++++++++++++++++++++++++++++++
> >  drivers/md/bcache/writeback.c |  8 +++++
> >  drivers/md/bcache/writeback.h | 14 +++++++++
> >  5 files changed, 110 insertions(+)
> > 
> > diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
> > index 6b420a5..bf945fa 100644
> > --- a/drivers/md/bcache/bcache.h
> > +++ b/drivers/md/bcache/bcache.h
> > @@ -367,6 +367,9 @@ struct cached_dev {
> >  	unsigned		writeback_rate_update_seconds;
> >  	unsigned		writeback_rate_d_term;
> >  	unsigned		writeback_rate_p_term_inverse;
> > +
> > +	unsigned short		ioprio_writeback;
> > +	unsigned short		ioprio_bypass;
> >  };
> >  
> >  enum alloc_reserve {
> > diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
> > index 4b177fe..b6911df 100644
> > --- a/drivers/md/bcache/request.c
> > +++ b/drivers/md/bcache/request.c
> > @@ -375,6 +375,7 @@ static bool check_should_bypass(struct cached_dev *dc, struct bio *bio)
> >  	unsigned sectors, congested = bch_get_congested(c);
> >  	struct task_struct *task = current;
> >  	struct io *i;
> > +	struct io_context *ioc;
> >  
> >  	if (test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags) ||
> >  	    c->gc_stats.in_use > CUTOFF_CACHE_ADD ||
> > @@ -386,6 +387,19 @@ static bool check_should_bypass(struct cached_dev *dc, struct bio *bio)
> >  	     op_is_write(bio_op(bio))))
> >  		goto skip;
> >  
> > +	/* If process ioprio is lower-or-equal to dc->ioprio_bypass, then
> > +	 * hint for bypass. Note that a lower-priority IO class+value
> > +	 * has a greater numeric value. */
> > +	ioc = get_task_io_context(current, GFP_ATOMIC, NUMA_NO_NODE);
> 
> get_task_io_context(current) won't work correctly for buffered writes since
> those are issued from writeback work items, not the originating process. There
> is machinery for tracking, per page, the process that the write originated from,
> but I haven't looked into how that works.

Interesting.  Reads work which will help the most common 
backup-bypass-cache usecase; fewer people will use the writeback hint 
unless they really need it.  Walking the bio per page to glean ioprio 
might be a future commit, but for now direct-io works (KVM on LVM, etc).

In v3 (sending shortly) I've set it to use bio_prio(bio) first, falling 
back to current's ioprio if unassigned in the bio in case upper layers are 
smart enough to pass the process's ioprio to the bio correctly.  

> Also, GFP_ATOMIC is wrong here - we're in process context, you want GFP_NOIO.

Also fixed in v3
 
> Other than that, looks reasonable to me.

Yay!  I'm targeting 4.9 since gregkh annouced it as longterm kernel.

I'll send up a pull request unless you see something to fix in v3.

Commit 3/3 adds bi_opf hints too, hopefully they appear sane. 
Let me know what you think.

--
Eric Wheeler



	echo 3,0 > /sys/block/bcache0/bcache/ioprio_bypass 
	echo 1,0 > /sys/block/bcache0/bcache/ioprio_writeback 

	============ Low prio bypass ============
	[root@bcache-dev ~]# ionice -c3 dd bs=1M of=/dev/bcache0 if=/dev/zero oflag=direct count=200
	200+0 records in
	200+0 records out
	209715200 bytes (210 MB) copied, 1.93203 s, 109 MB/s

	[root@bcache-dev ~]# cat /sys/block/bcache0/bcache/dirty_data
	0  <<< nothing dirty
	
	# io rates ber disk:
	dsk/bcache0---dsk/ssd0----dsk/bdev-
	 read  writ: read  writ: read  writ
	   0    95M:   0  4096B:   0    95M
	   0   102M:   0  4608B:   0   102M



	============ High prio writeback ============
	[root@bcache-dev ~]# ionice -c1 dd bs=1M of=/dev/bcache0 if=/dev/zero oflag=direct count=200
	200+0 records in
	200+0 records out
	209715200 bytes (210 MB) copied, 3.36397 s, 62.3 MB/s

	[root@bcache-dev ~]# cat /sys/block/bcache0/bcache/dirty_data
	200M <<< all dirty

	# io rates ber disk - ignore my slow ssd performance, its not an ssd ;)
	dsk/bcache0---dsk/ssd0----dsk/bdev-
	 read  writ: read  writ: read  writ
	   0    37M:   0    50M:   0     0 
	   0    78M:   0   104M:   0     0 
	   0    79M:   0   106M:   0     0 
	 456k 6144k: 456k 8195k:   0     0 


--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux