On 2019/11/12 2:39 下午, Christian Balzer wrote: > On Tue, 12 Nov 2019 13:00:14 +0800 Coly Li wrote: > >> On 2019/11/12 9:17 上午, Christian Balzer wrote: >>> On Mon, 11 Nov 2019 23:56:04 +0800 Coly Li wrote: >>> >>>> On 2019/11/11 10:10 上午, Christian Balzer wrote: >>>>> >>>>> >>>>> Hello, >>>>> >>>>> When researching the issues below and finding out about the PDC changes >>>>> since 4.9 this also provided a good explanation for the load spikes we see >>>>> with 4.9, as the default writeback is way too slow to empty the dirty >>>>> pages and thus there is never much of a buffer for sudden write spikes, >>>>> causing the PDC to overshoot when trying to flush things out to the >>>>> backing device. >>>>> >>>>> With Debian Buster things obviously changed and the current kernel >>>>> --- >>>>> Linux version 4.19.0-6-amd64 (debian-kernel@xxxxxxxxxxxxxxxx) (gcc version >>>>> 8.3.0 (Debian 8.3.0-6)) #1 SMP Debian 4.19.67-2+deb10u1 (2019-09-20) --- >>>>> we get writeback_rate_minimum (undocumented, value in 512Byte blocks). >>>>> That looked promising and indeed it helps, but there are major gotchas. >>>>> For the tests below I did set this to 8192 aka 4MB/s, which is something >>>>> the backing Areca RAID (4GB cache, 16 handles at 0% utilization. >>>>> >>>>> 1. Quiescent insanity >>>>> >>>>> When running fio (see full command line and results below) all looks/seems >>>>> fine, aside from issue #2 of course. >>>>> However if one stops fio and the system is fully quiescent (no writes) >>>>> then the new PDC goes berserk, most likely a division by zero type bug. >>>>> >>>>> writeback_rate_debug goes from (just after stopping fio): >>>>> --- >>>>> rate: 4.0M/sec >>>>> dirty: 954.7M >>>>> target: 36.7G >>>>> proportional: -920.7M >>>>> integral: -17.1M >>>>> change: 0.0k/sec >>>>> next io: -7969ms >>>>> --- >>>>> >>>>> to: >>>>> --- >>>>> rate: 0.9T/sec >>>>> dirty: 496.4M >>>>> target: 36.7G >>>>> proportional: 0.0k >>>>> integral: 0.0k >>>>> change: 0.0k/sec >>>>> next io: -2000ms >>>>> --- >>>>> completely overwhelming the backing device and causing (again) massive >>>>> load spikes. Very unintuitive and unexpected. >>>>> >>>>> Any IO (like a fio with 1 IOPS target) will prevent this and the preset >>>>> writeback_rate_minimum will be honored until the cache is clean. >>>>> >>>>> >>>> >>>> This is a feature indeed.. When there is no I/O for a reasonable long >>>> time, the writeback rate limit will be set to 1TB/s, to permit the >>>> writeback I/O to perform as fastly as possible. >>>> >>>> And as you observed, once there is new I/O coming, the maximized >>>> writeback I/O will canceled and back to be controlled by PDC code. >>>> >>>> Is there any inconvenience for such behavior in your environment ? >>>> >>> Yes, unwanted load spikes, as I wrote. >>> Utilization of the backing RAID AND caching SSDs yo-yo'ing up to 100%. >>> >> >> Copied. >> >> The maximum writeback rate feature is required by users indeed, from >> workloads like desktop I/O acceleration, data base and distributed >> storage. People want to make the writeback thread to accomplish dirty >> data flushing as faster as possible in I/O idle time, then, >> - For desktop the hard drive may go to sleep and safe energy. >> - For other online workload less dirty data in cache means more writing >> can be served in busy hours. >> - Previous code will wake up hard drive every second for writeback at >> 4KB/s, so the hard drives are not able to have a rest even in I/O idle >> hours. >> > No arguments here and in my use case the _minimum_ rate set to 4MB/s aka > 8192 achieves all that. > >> Therefore the maximum writeback rate is added, to maximum the writeback >> rate limit (1TB for now), then in I/O idle hours the dirty data can be >> flushed as faster as possible by writeback thread. From internal >> customers and external users, the feedback of maximum writeback rate is >> quite positive. This is the first time I realize not everyone wants it. >> > > The full speed (1TB/s) rate will result in initially high speeds (up to > 280MBs) in most tests, but degrade (and cause load spikes -> alarms) later > on, often resulting in it taking LONGER than if it had stuck with the > 4MB/s minimum rate set. > So yes, in my case something like a 32MB/s maximum rate would probably be > perfect. > Hi Christian, Could you please try the attached patch in your environment ? Let's see whether it makes things better on your side. Thanks. -- Coly Li
From 7d77b1cf41d28153e018185168ae54804f26e6f5 Mon Sep 17 00:00:00 2001 From: Coly Li <colyli@xxxxxxx> Date: Tue, 12 Nov 2019 18:24:36 +0800 Subject: [PATCH] bcache: add idle_max_writeback_rate sysfs interface For writeback mode, if there is no regular I/O request for a while, the writeback rate will be set to the maximum value (1TB/s for now). This is good for most of the storage workload, but there are still people don't what the maximum writeback rate in I/O idle time. This patch adds a sysfs interface file idle_max_writeback_rate to permit people to disable maximum writeback rate. Then the minimum writeback rate can be advised by writeback_rate_minimum in the bcache device's sysfs interface. Reported-by: Christian Balzer <chibi@xxxxxxx> Signed-off-by: Coly Li <colyli@xxxxxxx> --- drivers/md/bcache/bcache.h | 1 + drivers/md/bcache/super.c | 1 + drivers/md/bcache/sysfs.c | 7 +++++++ drivers/md/bcache/writeback.c | 4 ++++ 4 files changed, 13 insertions(+) diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h index 50241e045c70..9198c1b480d9 100644 --- a/drivers/md/bcache/bcache.h +++ b/drivers/md/bcache/bcache.h @@ -724,6 +724,7 @@ struct cache_set { unsigned int gc_always_rewrite:1; unsigned int shrinker_disabled:1; unsigned int copy_gc_enabled:1; + unsigned int idle_max_writeback_rate_enabled:1; #define BUCKET_HASH_BITS 12 struct hlist_head bucket_hash[1 << BUCKET_HASH_BITS]; diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index d1352fcc6ff2..77e9869345e7 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -1834,6 +1834,7 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb) c->congested_read_threshold_us = 2000; c->congested_write_threshold_us = 20000; c->error_limit = DEFAULT_IO_ERROR_LIMIT; + c->idle_max_writeback_rate_enabled = 1; WARN_ON(test_and_clear_bit(CACHE_SET_IO_DISABLE, &c->flags)); return c; diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c index 627dcea0f5b6..733e2ddf3c78 100644 --- a/drivers/md/bcache/sysfs.c +++ b/drivers/md/bcache/sysfs.c @@ -134,6 +134,7 @@ rw_attribute(expensive_debug_checks); rw_attribute(cache_replacement_policy); rw_attribute(btree_shrinker_disabled); rw_attribute(copy_gc_enabled); +rw_attribute(idle_max_writeback_rate); rw_attribute(gc_after_writeback); rw_attribute(size); @@ -747,6 +748,8 @@ SHOW(__bch_cache_set) sysfs_printf(gc_always_rewrite, "%i", c->gc_always_rewrite); sysfs_printf(btree_shrinker_disabled, "%i", c->shrinker_disabled); sysfs_printf(copy_gc_enabled, "%i", c->copy_gc_enabled); + sysfs_printf(idle_max_writeback_rate, "%i", + c->idle_max_writeback_rate_enabled); sysfs_printf(gc_after_writeback, "%i", c->gc_after_writeback); sysfs_printf(io_disable, "%i", test_bit(CACHE_SET_IO_DISABLE, &c->flags)); @@ -864,6 +867,9 @@ STORE(__bch_cache_set) sysfs_strtoul_bool(gc_always_rewrite, c->gc_always_rewrite); sysfs_strtoul_bool(btree_shrinker_disabled, c->shrinker_disabled); sysfs_strtoul_bool(copy_gc_enabled, c->copy_gc_enabled); + sysfs_strtoul_bool(idle_max_writeback_rate, + c->idle_max_writeback_rate_enabled); + /* * write gc_after_writeback here may overwrite an already set * BCH_DO_AUTO_GC, it doesn't matter because this flag will be @@ -954,6 +960,7 @@ static struct attribute *bch_cache_set_internal_files[] = { &sysfs_gc_always_rewrite, &sysfs_btree_shrinker_disabled, &sysfs_copy_gc_enabled, + &sysfs_idle_max_writeback_rate, &sysfs_gc_after_writeback, &sysfs_io_disable, &sysfs_cutoff_writeback, diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c index d60268fe49e1..4a40f9eadeaf 100644 --- a/drivers/md/bcache/writeback.c +++ b/drivers/md/bcache/writeback.c @@ -122,6 +122,10 @@ static void __update_writeback_rate(struct cached_dev *dc) static bool set_at_max_writeback_rate(struct cache_set *c, struct cached_dev *dc) { + /* Don't sst max writeback rate if it is disabled */ + if (!c->idle_max_writeback_rate_enabled) + return false; + /* Don't set max writeback rate if gc is running */ if (!c->gc_mark_valid) return false; -- 2.16.4