Re: [PATCH] bcache: set cool backing device to maximum writeback rate

Coly Li <colyli@xxxxxxx> · Thu, 15 Sep 2022 22:44:02 +0800

> 2022年9月15日 20:05，mingzhe.zou@xxxxxxxxxxxx 写道：
> 
> From: mingzhe <mingzhe.zou@xxxxxxxxxxxx>
> 
> If the data in the cache is dirty, gc thread cannot reclaim the space.
> We need to writeback dirty data to backing, and then gc can reclaim
> this area. So bcache will writeback dirty data more aggressively.
> 

The writeback operation should try to avoid negative influence to front end I/O performance. Especially the I/O latency.

> Currently, there is no io request within 30 seconds of the cache_set,
> all backing devices in it will be set to the maximum writeback rate.

The idle time depends how many backing devices there are. If there is 1 backing device, the idle time before maximum writeback rate setting is 30 seconds, if there are 2 backing device, the idle time will be 60 seconds. If there are 6 backing device attached with a cache set, the maximum writeback rate will be set after 180 seconds without any incoming I/O request. That is to say, the maximum writeback rate setting is not a aggressive writeback policy, it is just try to writeback more dirty data without interfering regular I/O request when the cache set is really idle.

> 
> However, for multiple backings in the same cache_set, there maybe both
> cold and hot devices. Since the cold device has no read or write requests,
> dirty data should writeback as soon as possible.
> 

NACK. The writeback thread will take mutex lock(s) of btree nodes, increasing writeback rate may introduce negative contribution to front end I/O performance. And your change will make the P.I controller for writeback rate not work properly.

> This patch reduces the control granularity of set_at_max_writeback_rate()
> from cache_set to cached_dev. Because even cache_set still has io requests,
> writeback cold data can make more space for hot data.

I won’t take this patch, because it will obviously interfere front I/O performance. Maybe it really works fine in your condition, but other people with other workload will not happy.

Thanks.

Coly Li

> 
> Signed-off-by: mingzhe <mingzhe.zou@xxxxxxxxxxxx>
> ---
> drivers/md/bcache/bcache.h    |  5 +++--
> drivers/md/bcache/request.c   | 10 +++++-----
> drivers/md/bcache/writeback.c | 16 +++++++---------
> 3 files changed, 15 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
> index f4436229cd83..768bb217e156 100644
> --- a/drivers/md/bcache/bcache.h
> +++ b/drivers/md/bcache/bcache.h
> @@ -330,6 +330,9 @@ struct cached_dev {
> 	 */
> 	atomic_t		has_dirty;
> 
> +	atomic_t		idle_counter;
> +	atomic_t		at_max_writeback_rate;
> +
> #define BCH_CACHE_READA_ALL		0
> #define BCH_CACHE_READA_META_ONLY	1
> 	unsigned int		cache_readahead_policy;
> @@ -520,8 +523,6 @@ struct cache_set {
> 	struct cache_accounting accounting;
> 
> 	unsigned long		flags;
> -	atomic_t		idle_counter;
> -	atomic_t		at_max_writeback_rate;
> 
> 	struct cache		*cache;
> 
> diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
> index f2c5a7e06fa9..f53b5831f500 100644
> --- a/drivers/md/bcache/request.c
> +++ b/drivers/md/bcache/request.c
> @@ -1141,7 +1141,7 @@ static void quit_max_writeback_rate(struct cache_set *c,
> 	 * To avoid such situation, if mutext_trylock() failed, only writeback
> 	 * rate of current cached device is set to 1, and __update_write_back()
> 	 * will decide writeback rate of other cached devices (remember now
> -	 * c->idle_counter is 0 already).
> +	 * dc->idle_counter is 0 already).
> 	 */
> 	if (mutex_trylock(&bch_register_lock)) {
> 		for (i = 0; i < c->devices_max_used; i++) {
> @@ -1184,16 +1184,16 @@ void cached_dev_submit_bio(struct bio *bio)
> 	}
> 
> 	if (likely(d->c)) {
> -		if (atomic_read(&d->c->idle_counter))
> -			atomic_set(&d->c->idle_counter, 0);
> +		if (atomic_read(&dc->idle_counter))
> +			atomic_set(&dc->idle_counter, 0);
> 		/*
> 		 * If at_max_writeback_rate of cache set is true and new I/O
> 		 * comes, quit max writeback rate of all cached devices
> 		 * attached to this cache set, and set at_max_writeback_rate
> 		 * to false.
> 		 */
> -		if (unlikely(atomic_read(&d->c->at_max_writeback_rate) == 1)) {
> -			atomic_set(&d->c->at_max_writeback_rate, 0);
> +		if (unlikely(atomic_read(&dc->at_max_writeback_rate) == 1)) {
> +			atomic_set(&dc->at_max_writeback_rate, 0);
> 			quit_max_writeback_rate(d->c, dc);
> 		}
> 	}
> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> index 3f0ff3aab6f2..40e10fd3552e 100644
> --- a/drivers/md/bcache/writeback.c
> +++ b/drivers/md/bcache/writeback.c
> @@ -172,7 +172,7 @@ static bool set_at_max_writeback_rate(struct cache_set *c,
> 	 * called. If all backing devices attached to the same cache set have
> 	 * identical dc->writeback_rate_update_seconds values, it is about 6
> 	 * rounds of update_writeback_rate() on each backing device before
> -	 * c->at_max_writeback_rate is set to 1, and then max wrteback rate set
> +	 * dc->at_max_writeback_rate is set to 1, and then max wrteback rate set
> 	 * to each dc->writeback_rate.rate.
> 	 * In order to avoid extra locking cost for counting exact dirty cached
> 	 * devices number, c->attached_dev_nr is used to calculate the idle
> @@ -180,12 +180,11 @@ static bool set_at_max_writeback_rate(struct cache_set *c,
> 	 * back mode, but it still works well with limited extra rounds of
> 	 * update_writeback_rate().
> 	 */
> -	if (atomic_inc_return(&c->idle_counter) <
> -	    atomic_read(&c->attached_dev_nr) * 6)
> +	if (atomic_inc_return(&dc->idle_counter) < 6)
> 		return false;
> 
> -	if (atomic_read(&c->at_max_writeback_rate) != 1)
> -		atomic_set(&c->at_max_writeback_rate, 1);
> +	if (atomic_read(&dc->at_max_writeback_rate) != 1)
> +		atomic_set(&dc->at_max_writeback_rate, 1);
> 
> 	atomic_long_set(&dc->writeback_rate.rate, INT_MAX);
> 
> @@ -195,14 +194,13 @@ static bool set_at_max_writeback_rate(struct cache_set *c,
> 	dc->writeback_rate_change = 0;
> 
> 	/*
> -	 * Check c->idle_counter and c->at_max_writeback_rate agagain in case
> +	 * Check dc->idle_counter and dc->at_max_writeback_rate agagain in case
> 	 * new I/O arrives during before set_at_max_writeback_rate() returns.
> 	 * Then the writeback rate is set to 1, and its new value should be
> 	 * decided via __update_writeback_rate().
> 	 */
> -	if ((atomic_read(&c->idle_counter) <
> -	     atomic_read(&c->attached_dev_nr) * 6) ||
> -	    !atomic_read(&c->at_max_writeback_rate))
> +	if ((atomic_read(&dc->idle_counter) < 6) ||
> +	    !atomic_read(&dc->at_max_writeback_rate))
> 		return false;
> 
> 	return true;
> -- 
> 2.17.1
>