[+cc Michael Lyle] On Fri, 27 Oct 2017, Eric Wheeler wrote: > On Sun, 16 Jul 2017, Coly Li wrote: > > > On 2017/7/1 上午4:43, bcache@xxxxxxxxxxxxxxxxxx wrote: > > > From: Tang Junhui <tang.junhui@xxxxxxxxxx> > > > > > > When there is not enough dirty data in writeback cache, > > > writeback rate is at minimum 1 key per second > > > util all dirty data to be cleaned, it is inefficiency, > > > and also causes waste of energy; > > > > Hi Junhui and Eric, > > > > What: /sys/block/<disk>/bcache/writeback_percent > > Description: > > For backing devices: If nonzero, writeback from cache to > > backing device only takes place when more than this percentage > > of the cache is used, allowing more write coalescing to take > > place and reducing total number of writes sent to the backing > > device. Integer between 0 and 40. > > > > I see above text from Documentation/ABI/testing/sysfs-block-bcache (I > > know this document is quite old), it seems if "not enough" means dirty > > data percentage is less then writback_percent, bcache should not > > performance writeback I/O. But in __update_writeback_rate(), > > writeback_rate.rate is clamped in [1, NSEC_PER_MSEC]. It seems in PD > > controller code of __update_writeback_rate(), writeback_percent is only > > used to calculate dirty target number, its another functionality as > > writeback threshold is not handled here. > > > > > > > > in this patch, When there is not enough dirty data, > > > let the writeback rate to be 0, and writeback re-schedule > > > in bch_writeback_thread() periodically with schedule_timeout(), > > > the behaviors are as follows : > > > > > > 1) If no dirty data have been read into dc->writeback_keys, > > > goto step 2), otherwise keep writing these dirty data to > > > back-end device at 1 key per second, until all these dirty data > > > write over, then goto step 2). > > > > > > 2) Loop in bch_writeback_thread() to check if there is enough > > > dirty data for writeback. if there is not enough diry data for > > > writing, then sleep 10 seconds, otherwise, write dirty data to > > > back-end device. > > > > Bcache uses a Proportion-Differentiation Controller to control writeback > > rate. When dirty data is far from target, writeback rate is higher; when > > dirty data is close to target, writeback rate is slower. The advantage > > of PD controller here is, when regular I/O and writeback I/O happens in > > same time, > > - When there are a lot of dirty data, writeback I/O can have more chance > > to write them back to cached device, which in turns has positive impact > > to regular I/O. > > - When dirty data is decreased and close to target dirty number, less > > writeback I/O can help regular I/O has better throughput and latency. > > > > The root cause of 1 key per second is, the PD controller is designed for > > better I/O performance, not less energy consumption. When the existing > > dirty data gets closed to target dirty number, the PD controller chooses > > to use longer writeback time to make a better regular I/O performance. > > If it is designed for less energy consumption, it should keep the > > writeback rate in a high level and finish writing back all dirty data as > > soon as possible. > > > > This patch may introduce an unexpected behavior of dirty data writeback > > throughput, when regular write I/O and writeback I/O happen in same > > time. In this case, dirty data number may shake up and down around > > target dirty number, it is possible that change (the variable in > > __update_writeback_rate()) is a minus value, and the result of > > dc->writeback_rate.rate + change happens to be 0. This patch changes the > > clamp range of writeback_rate.rate to [0, NSEC_PER_MSEC], so > > writeback_rate.rate can be possible to be 0. And in bch_next_delay() if > > d->rate is zero, the write back I/O will be delayed to now + > > NSEC_PER_SEC. When there is no regular I/O it works well, but when there > > is regular I/O, this longer delay may cause more dirty data piled in > > cache device, and PD controller cannot generage a stable writeback rate. > > This is not an expected behavior for the writeback rate PD controller. > > > > Another method to fix might be, > > 1) define a sysfs to define writeback_rate with max/dynamic option. > > 2) dynamic writeback_rate as default > > 3) when max is set, in __update_writeback_rate() assign NSEC_PER_MSEC to > > writeback_rate.rate > > 4) in bch_writeback_thread(), if no writeback I/O on fly, and dirty data > > does not reach dc->writeback_percent, schedule out. > > 5) if writeback is necessary then do it in max rate and finish it as > > soon as possible, to save laptop energy. > > > > The above method might be helpful to energy save as well (perform dirty > > dat write back in batch), and does not change default PD controller > > behavior. > > > > Just for your reference. Or if you are too busy to look at it, I can try > > to compose a patch for review. > > Hi Coli, > > Did this go anywere? I think the 1-key/sec fix is a good idea and your > suggestion will help out mobile users. > > > > -- > Eric Wheeler > > > > > > Coly > > > > > > > > Signed-off-by: Tang Junhui <tang.junhui@xxxxxxxxxx> > > > --- > > > drivers/md/bcache/util.c | 9 ++++++++- > > > drivers/md/bcache/writeback.c | 11 +++++++---- > > > 2 files changed, 15 insertions(+), 5 deletions(-) > > > > > > diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c > > > index 8c3a938..49dcf09 100644 > > > --- a/drivers/md/bcache/util.c > > > +++ b/drivers/md/bcache/util.c > > > @@ -210,7 +210,14 @@ uint64_t bch_next_delay(struct bch_ratelimit *d, uint64_t done) > > > { > > > uint64_t now = local_clock(); > > > > > > - d->next += div_u64(done * NSEC_PER_SEC, d->rate); > > > + /* > > > + if d->rate is zero, write the left dirty data > > > + at the speed of one key per second > > > + */ > > > + if(!d->rate) > > > + d->next = now + NSEC_PER_SEC; > > > + else > > > + d->next += div_u64(done * NSEC_PER_SEC, d->rate); > > > > > > if (time_before64(now + NSEC_PER_SEC, d->next)) > > > d->next = now + NSEC_PER_SEC; > > > diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c > > > index 25289e4..4104eaa 100644 > > > --- a/drivers/md/bcache/writeback.c > > > +++ b/drivers/md/bcache/writeback.c > > > @@ -16,6 +16,8 @@ > > > #include <linux/sched/clock.h> > > > #include <trace/events/bcache.h> > > > > > > +#define WRITE_BACK_WAIT_CYCLE 10 * HZ > > > + > > > /* Rate limiting */ > > > > > > static void __update_writeback_rate(struct cached_dev *dc) > > > @@ -55,13 +57,14 @@ static void __update_writeback_rate(struct cached_dev *dc) > > > > > > /* Don't increase writeback rate if the device isn't keeping up */ > > > if (change > 0 && > > > + dc->writeback_rate.rate >0 && > > > time_after64(local_clock(), > > > dc->writeback_rate.next + NSEC_PER_MSEC)) > > > change = 0; > > > > > > dc->writeback_rate.rate = > > > clamp_t(int64_t, (int64_t) dc->writeback_rate.rate + change, > > > - 1, NSEC_PER_MSEC); > > > + 0, NSEC_PER_MSEC); > > > > > > dc->writeback_rate_proportional = proportional; > > > dc->writeback_rate_derivative = derivative; > > > @@ -420,15 +423,15 @@ static int bch_writeback_thread(void *arg) > > > while (!kthread_should_stop()) { > > > down_write(&dc->writeback_lock); > > > if (!atomic_read(&dc->has_dirty) || > > > - (!test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags) && > > > - !dc->writeback_running)) { > > > + ((!dc->writeback_rate.rate || !dc->writeback_running) && > > > + !test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags))) { > > > up_write(&dc->writeback_lock); > > > set_current_state(TASK_INTERRUPTIBLE); > > > > > > if (kthread_should_stop()) > > > return 0; > > > > > > - schedule(); > > > + schedule_timeout(WRITE_BACK_WAIT_CYCLE); > > > continue; > > > } > > > > > > > > > > > > -- > > Coly Li > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > >