Re: Several bugs/flaws in the current(?) bcache implementation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2019/11/12 2:39 下午, Christian Balzer wrote:
> On Tue, 12 Nov 2019 13:00:14 +0800 Coly Li wrote:
> 
>> On 2019/11/12 9:17 上午, Christian Balzer wrote:
>>> On Mon, 11 Nov 2019 23:56:04 +0800 Coly Li wrote:
>>>   
>>>> On 2019/11/11 10:10 上午, Christian Balzer wrote:  
>>>>>
>>>>>
>>>>> Hello,
>>>>>
>>>>> When researching the issues below and finding out about the PDC changes
>>>>> since 4.9 this also provided a good explanation for the load spikes we see
>>>>> with 4.9, as the default writeback is way too slow to empty the dirty
>>>>> pages and thus there is never much of a buffer for sudden write spikes,
>>>>> causing the PDC to overshoot when trying to flush things out to the
>>>>> backing device.
>>>>>
>>>>> With Debian Buster things obviously changed and the current kernel
>>>>> ---
>>>>> Linux version 4.19.0-6-amd64 (debian-kernel@xxxxxxxxxxxxxxxx) (gcc version
>>>>> 8.3.0 (Debian 8.3.0-6)) #1 SMP Debian 4.19.67-2+deb10u1 (2019-09-20) ---
>>>>> we get writeback_rate_minimum (undocumented, value in 512Byte blocks).
>>>>> That looked promising and indeed it helps, but there are major gotchas.
>>>>> For the tests below I did set this to 8192 aka 4MB/s, which is something
>>>>> the backing Areca RAID (4GB cache, 16 handles at 0% utilization.
>>>>>
>>>>> 1. Quiescent insanity
>>>>>
>>>>> When running fio (see full command line and results below) all looks/seems
>>>>> fine, aside from issue #2 of course.
>>>>> However if one stops fio and the system is fully quiescent (no writes)
>>>>> then the new PDC goes berserk, most likely a division by zero type bug.
>>>>>
>>>>> writeback_rate_debug goes from (just after stopping fio):
>>>>> ---
>>>>> rate:           4.0M/sec
>>>>> dirty:          954.7M
>>>>> target:         36.7G
>>>>> proportional:   -920.7M
>>>>> integral:       -17.1M
>>>>> change:         0.0k/sec
>>>>> next io:        -7969ms
>>>>> ---
>>>>>
>>>>> to:
>>>>> ---
>>>>> rate:           0.9T/sec
>>>>> dirty:          496.4M
>>>>> target:         36.7G
>>>>> proportional:   0.0k
>>>>> integral:       0.0k
>>>>> change:         0.0k/sec
>>>>> next io:        -2000ms
>>>>> ---
>>>>> completely overwhelming the backing device and causing (again) massive
>>>>> load spikes. Very unintuitive and unexpected.
>>>>>
>>>>> Any IO (like a fio with 1 IOPS target) will prevent this and the preset
>>>>> writeback_rate_minimum will be honored until the cache is clean.
>>>>>
>>>>>     
>>>>
>>>> This is a feature indeed.. When there is no I/O for a reasonable long
>>>> time, the writeback rate limit will be set to 1TB/s, to permit the
>>>> writeback I/O to perform as fastly as possible.
>>>>
>>>> And as you observed, once there is new I/O coming, the maximized
>>>> writeback I/O will canceled and back to be controlled by PDC code.
>>>>
>>>> Is there any inconvenience for such behavior in your environment ?
>>>>  
>>> Yes, unwanted load spikes, as I wrote.
>>> Utilization of the backing RAID AND caching SSDs yo-yo'ing up to 100%.
>>>   
>>
>> Copied.
>>
>> The maximum writeback rate feature is required by users indeed, from 
>> workloads like desktop I/O acceleration, data base and distributed
>> storage. People want to make the writeback thread to accomplish dirty
>> data flushing as faster as possible in I/O idle time, then,
>> - For desktop the hard drive may go to sleep and safe energy.
>> - For other online workload less dirty data in cache means more writing
>> can be served in busy hours.
>> - Previous code will wake up hard drive every second for writeback at
>> 4KB/s, so the hard drives are not able to have a rest even in I/O idle
>> hours.
>>
> No arguments here and in my use case the _minimum_ rate set to 4MB/s aka
> 8192 achieves all that.
> 
>> Therefore the maximum writeback rate is added, to maximum the writeback
>> rate limit (1TB for now), then in I/O idle hours the dirty data can be
>> flushed as faster as possible by writeback thread. From internal
>> customers and external users, the feedback of maximum writeback rate is
>> quite positive. This is the first time I realize not everyone wants it.
>>
> 
> The full speed (1TB/s) rate will result in initially high speeds (up to
> 280MBs) in most tests, but degrade (and cause load spikes -> alarms) later
> on, often resulting in it taking LONGER than if it had stuck with the
> 4MB/s minimum rate set.
> So yes, in my case something like a 32MB/s maximum rate would probably be
> perfect.
>

Hi Christian,

Could you please try the attached patch in your environment ? Let's see
whether it makes things better on your side.

Thanks.
-- 

Coly Li
From 7d77b1cf41d28153e018185168ae54804f26e6f5 Mon Sep 17 00:00:00 2001
From: Coly Li <colyli@xxxxxxx>
Date: Tue, 12 Nov 2019 18:24:36 +0800
Subject: [PATCH] bcache: add idle_max_writeback_rate sysfs interface

For writeback mode, if there is no regular I/O request for a while,
the writeback rate will be set to the maximum value (1TB/s for now).
This is good for most of the storage workload, but there are still
people don't what the maximum writeback rate in I/O idle time.

This patch adds a sysfs interface file idle_max_writeback_rate to
permit people to disable maximum writeback rate. Then the minimum
writeback rate can be advised by writeback_rate_minimum in the
bcache device's sysfs interface.

Reported-by: Christian Balzer <chibi@xxxxxxx>
Signed-off-by: Coly Li <colyli@xxxxxxx>
---
 drivers/md/bcache/bcache.h    | 1 +
 drivers/md/bcache/super.c     | 1 +
 drivers/md/bcache/sysfs.c     | 7 +++++++
 drivers/md/bcache/writeback.c | 4 ++++
 4 files changed, 13 insertions(+)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 50241e045c70..9198c1b480d9 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -724,6 +724,7 @@ struct cache_set {
 	unsigned int		gc_always_rewrite:1;
 	unsigned int		shrinker_disabled:1;
 	unsigned int		copy_gc_enabled:1;
+	unsigned int		idle_max_writeback_rate_enabled:1;
 
 #define BUCKET_HASH_BITS	12
 	struct hlist_head	bucket_hash[1 << BUCKET_HASH_BITS];
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index d1352fcc6ff2..77e9869345e7 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1834,6 +1834,7 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb)
 	c->congested_read_threshold_us	= 2000;
 	c->congested_write_threshold_us	= 20000;
 	c->error_limit	= DEFAULT_IO_ERROR_LIMIT;
+	c->idle_max_writeback_rate_enabled = 1;
 	WARN_ON(test_and_clear_bit(CACHE_SET_IO_DISABLE, &c->flags));
 
 	return c;
diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c
index 627dcea0f5b6..733e2ddf3c78 100644
--- a/drivers/md/bcache/sysfs.c
+++ b/drivers/md/bcache/sysfs.c
@@ -134,6 +134,7 @@ rw_attribute(expensive_debug_checks);
 rw_attribute(cache_replacement_policy);
 rw_attribute(btree_shrinker_disabled);
 rw_attribute(copy_gc_enabled);
+rw_attribute(idle_max_writeback_rate);
 rw_attribute(gc_after_writeback);
 rw_attribute(size);
 
@@ -747,6 +748,8 @@ SHOW(__bch_cache_set)
 	sysfs_printf(gc_always_rewrite,		"%i", c->gc_always_rewrite);
 	sysfs_printf(btree_shrinker_disabled,	"%i", c->shrinker_disabled);
 	sysfs_printf(copy_gc_enabled,		"%i", c->copy_gc_enabled);
+	sysfs_printf(idle_max_writeback_rate,	"%i",
+		     c->idle_max_writeback_rate_enabled);
 	sysfs_printf(gc_after_writeback,	"%i", c->gc_after_writeback);
 	sysfs_printf(io_disable,		"%i",
 		     test_bit(CACHE_SET_IO_DISABLE, &c->flags));
@@ -864,6 +867,9 @@ STORE(__bch_cache_set)
 	sysfs_strtoul_bool(gc_always_rewrite,	c->gc_always_rewrite);
 	sysfs_strtoul_bool(btree_shrinker_disabled, c->shrinker_disabled);
 	sysfs_strtoul_bool(copy_gc_enabled,	c->copy_gc_enabled);
+	sysfs_strtoul_bool(idle_max_writeback_rate,
+			   c->idle_max_writeback_rate_enabled);
+
 	/*
 	 * write gc_after_writeback here may overwrite an already set
 	 * BCH_DO_AUTO_GC, it doesn't matter because this flag will be
@@ -954,6 +960,7 @@ static struct attribute *bch_cache_set_internal_files[] = {
 	&sysfs_gc_always_rewrite,
 	&sysfs_btree_shrinker_disabled,
 	&sysfs_copy_gc_enabled,
+	&sysfs_idle_max_writeback_rate,
 	&sysfs_gc_after_writeback,
 	&sysfs_io_disable,
 	&sysfs_cutoff_writeback,
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index d60268fe49e1..4a40f9eadeaf 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -122,6 +122,10 @@ static void __update_writeback_rate(struct cached_dev *dc)
 static bool set_at_max_writeback_rate(struct cache_set *c,
 				       struct cached_dev *dc)
 {
+	/* Don't sst max writeback rate if it is disabled */
+	if (!c->idle_max_writeback_rate_enabled)
+		return false;
+
 	/* Don't set max writeback rate if gc is running */
 	if (!c->gc_mark_valid)
 		return false;
-- 
2.16.4


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux