> Going through various forum messages and bug reports (for Debian: > https://lists.debian.org/debian-kernel/2015/03/msg00060.html for Arch: > https://bugs.archlinux.org/task/38843 for some other distro: > https://groups.google.com/forum/#!msg/esos-users/NXp8tG7sVE8/QXZyPdZ2saIJ > ) it looks like the bcache_writeback kernel thread, being in > uninterruptible sleep, keeps the load average at 1.0 (or maybe more) > always. Could you please confirm this ? Try the attached patches that I've been collecting over the past year or two. I do not believe they have been merged into mainline (BUT SOMEONE NEEDS TO). I am not sure that these address the load bug, but if the load is being increased by a large amount of dmesg output caused by rcu traces, then the patches will help. -Eric > > - this behaviour should be described in the bcache documentation > because it feels to me (and many other) like a true gotcha. It's > apparently completely undocumented. A "CAVEATS" section at the > bottom of bcache.txt in the kernel Documentation explaining this > would be nice, what do you think? > > - Is there any way around this? Some people seem to grow uneasy (maybe > irrationnally) having a constant load on an otherwise unused system > (I know that a sleeping thread actually does nothing, but many > system administrators can't wrap their head around this idea). > > I've tried some advice I've found on the web, like switching to > writethrough and "echo 0 > /sys/block/bcache0/bcache/writeback_running" > but to absolutely no effect. > > Any advice and idea is welcome :) > > -- > ------------------------------------------------------------------------ > Emmanuel Florac | Direction technique > | Intellique > | <eflorac@xxxxxxxxxxxxxx> > | +33 1 78 94 84 02 > ------------------------------------------------------------------------ > -- > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html >
From: Zheng Liu <wenqing.lz@xxxxxxxxxx> In bcache_init() function it forgot to unregister reboot notifier if bcache fails to unregister a block device. This commit fixes this. Signed-off-by: Zheng Liu <wenqing.lz@xxxxxxxxxx> Tested-by: Joshua Schmid <jschmid@xxxxxxxx> --- drivers/md/bcache/super.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 4dd2bb7..fdbb211 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -2100,8 +2100,10 @@ static int __init bcache_init(void) closure_debug_init(); bcache_major = register_blkdev(0, "bcache"); - if (bcache_major < 0) + if (bcache_major < 0) { + unregister_reboot_notifier(&reboot); return bcache_major; + } if (!(bcache_wq = create_workqueue("bcache")) || !(bcache_kobj = kobject_create_and_add("bcache", fs_kobj)) ||
From: Zheng Liu <gnehzuil.liu@xxxxxxxxx> To: linux-bcache@xxxxxxxxxxxxxxx Cc: Zheng Liu <wenqing.lz@xxxxxxxxxx>, Joshua Schmid <jschmid@xxxxxxxx>, Zhu Yanhai <zhu.yanhai@xxxxxxxxx>, Kent Overstreet <kmo@xxxxxxxxxxxxx> Subject: [PATCH v2] bcache: fix a livelock in btree lock Date: Wed, 25 Feb 2015 20:32:09 +0800 (02/25/2015 04:32:09 AM) From: Zheng Liu <wenqing.lz@xxxxxxxxxx> This commit tries to fix a livelock in bcache. This livelock might happen when we causes a huge number of cache misses simultaneously. When we get a cache miss, bcache will execute the following path. ->cached_dev_make_request() ->cached_dev_read() ->cached_lookup() ->bch->btree_map_keys() ->btree_root() <------------------------ ->bch_btree_map_keys_recurse() | ->cache_lookup_fn() | ->cached_dev_cache_miss() | ->bch_btree_insert_check_key() -| [If btree->seq is not equal to seq + 1, we should return EINTR and traverse btree again.] In bch_btree_insert_check_key() function we first need to check upgrade flag (op->lock == -1), and when this flag is true we need to release read btree->lock and try to take write btree->lock. During taking and releasing this write lock, btree->seq will be monotone increased in order to prevent other threads modify this in cache miss (see btree.h:74). But if there are some cache misses caused by some requested, we could meet a livelock because btree->seq is always changed by others. Thus no one can make progress. This commit will try to take write btree->lock if it encounters a race when we traverse btree. Although it sacrifice the scalability but we can ensure that only one can modify the btree. Signed-off-by: Zheng Liu <wenqing.lz@xxxxxxxxxx> Tested-by: Joshua Schmid <jschmid@xxxxxxxx> Cc: Joshua Schmid <jschmid@xxxxxxxx> Cc: Zhu Yanhai <zhu.yanhai@xxxxxxxxx> Cc: Kent Overstreet <kmo@xxxxxxxxxxxxx> --- changelog: v2: fix a bug that stops all concurrency writes unconditionally. drivers/md/bcache/btree.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c index 218f21a..43829d9 100644 --- a/drivers/md/bcache/btree.c +++ b/drivers/md/bcache/btree.c @@ -2163,8 +2163,10 @@ int bch_btree_insert_check_key(struct btree *b, struct btree_op *op, rw_lock(true, b, b->level); if (b->key.ptr[0] != btree_ptr || - b->seq != seq + 1) + b->seq != seq + 1) { + op->lock = b->level; goto out; + } } SET_KEY_PTRS(check_key, 1);
From: Joshua Schmid <jschmid@xxxxxxxx> Subject: [PATCH] fix a leak in bch_cached_dev_run() Newsgroups: gmane.linux.kernel.bcache.devel Date: 2015-02-03 11:24:06 GMT (3 weeks, 2 days, 11 hours and 43 minutes ago) From: Al Viro <viro@xxxxxxxxxxxxxxxxxx> Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx> Tested-by: Joshua Schmid <jschmid@xxxxxxxx> --- drivers/md/bcache/super.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 8c2d657..53f1512 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -880,8 +880,11 @@ void bch_cached_dev_run(struct cached_dev *dc) buf[SB_LABEL_SIZE] = '\0'; env[2] = kasprintf(GFP_KERNEL, "CACHED_LABEL=%s", buf); - if (atomic_xchg(&dc->running, 1)) + if (atomic_xchg(&dc->running, 1)) { + kfree(env[1]); + kfree(env[2]); return; + } if (!d->c && BDEV_STATE(&dc->sb) != BDEV_STATE_NONE) { -- 2.1.2
From f0e6320a7874af434575f37a11ec6e4992cef790 Mon Sep 17 00:00:00 2001 From: Kent Overstreet <kmo@xxxxxxxxxxxxx> Date: Sat, 1 Nov 2014 13:44:47 -0700 Subject: [PATCH 1/5] bcache: Add a cond_resched() call to gc Git-commit: f0e6320a7874af434575f37a11ec6e4992cef790 Patch-mainline: Submitted References: bnc#910440 Change-id: Id4f18c533b80ddb40df94ed0bb5e2a236a4bc325 Signed-off-by: Takashi Iwai <tiwai@xxxxxxx> --- drivers/md/bcache/btree.c | 1 + 1 file changed, 1 insertion(+) --- a/drivers/md/bcache/btree.c 2014-11-03 16:51:01.720000000 -0800 +++ b/drivers/md/bcache/btree.c 2014-11-03 16:51:26.456000000 -0800 @@ -1741,6 +1741,7 @@ do { ret = btree_root(gc_root, c, &op, &writes, &stats); closure_sync(&writes); + cond_resched(); if (ret && ret != -EAGAIN) pr_warn("gc failed!");
From: Joshua Schmid <jschmid@xxxxxxxx> Subject: [PATCH] bcache: [BUG] clear BCACHE_DEV_UNLINK_DONE flag when attaching a backing device Newsgroups: gmane.linux.kernel.bcache.devel Date: 2015-02-03 11:18:01 GMT (3 weeks, 2 days, 11 hours and 45 minutes ago) From: Zheng Liu <wenqing.lz@xxxxxxxxxx> This bug can be reproduced by the following script: #!/bin/bash bcache_sysfs="/sys/fs/bcache" function clear_cache() { if [ ! -e $bcache_sysfs ]; then echo "no bcache sysfs" exit fi cset_uuid=$(ls -l $bcache_sysfs|head -n 2|tail -n 1|awk '{print $9}') sudo sh -c "echo $cset_uuid > /sys/block/sdb/sdb1/bcache/detach" sleep 5 sudo sh -c "echo $cset_uuid > /sys/block/sdb/sdb1/bcache/attach" } for ((i=0;i<10;i++)); do clear_cache done The warning messages look like below: [ 275.948611] ------------[ cut here ]------------ [ 275.963840] WARNING: at fs/sysfs/dir.c:512 sysfs_add_one+0xb8/0xd0() (Tainted: P W --------------- ) [ 275.979253] Hardware name: Tecal RH2285 [ 275.994106] sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:09.0/0000:08:00.0/host4/target4:2:1/4:2:1:0/block/sdb/sdb1/bcache/cache' [ 276.024105] Modules linked in: bcache tcp_diag inet_diag ipmi_devintf ipmi_si ipmi_msghandler bonding 8021q garp stp llc ipv6 ext3 jbd loop sg iomemory_vsl(P) bnx2 microcode serio_raw i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext4 jbd2 mbcache megaraid_sas pata_acpi ata_generic ata_piix dm_mod [last unloaded: scsi_wait_scan] [ 276.072643] Pid: 2765, comm: sh Tainted: P W --------------- 2.6.32 #1 [ 276.089315] Call Trace: [ 276.105801] [<ffffffff81070fe7>] ? warn_slowpath_common+0x87/0xc0 [ 276.122650] [<ffffffff810710d6>] ? warn_slowpath_fmt+0x46/0x50 [ 276.139361] [<ffffffff81205c08>] ? sysfs_add_one+0xb8/0xd0 [ 276.156012] [<ffffffff8120609b>] ? sysfs_do_create_link+0x12b/0x170 [ 276.172682] [<ffffffff81206113>] ? sysfs_create_link+0x13/0x20 [ 276.189282] [<ffffffffa03bda21>] ? bcache_device_link+0xc1/0x110 [bcache] [ 276.205993] [<ffffffffa03bfa08>] ? bch_cached_dev_attach+0x478/0x4f0 [bcache] [ 276.222794] [<ffffffffa03c4a17>] ? bch_cached_dev_store+0x627/0x780 [bcache] [ 276.239680] [<ffffffff8116783a>] ? alloc_pages_current+0xaa/0x110 [ 276.256594] [<ffffffff81203b15>] ? sysfs_write_file+0xe5/0x170 [ 276.273364] [<ffffffff811887b8>] ? vfs_write+0xb8/0x1a0 [ 276.290133] [<ffffffff811890b1>] ? sys_write+0x51/0x90 [ 276.306368] [<ffffffff8100c072>] ? system_call_fastpath+0x16/0x1b [ 276.322301] ---[ end trace 9f5d4fcdd0c3edfb ]--- [ 276.338241] ------------[ cut here ]------------ [ 276.354109] WARNING: at /home/wenqing.lz/bcache/bcache/super.c:720 bcache_device_link+0xdf/0x110 [bcache]() (Tainted: P W --------------- ) [ 276.386017] Hardware name: Tecal RH2285 [ 276.401430] Couldn't create device <-> cache set symlinks [ 276.401759] Modules linked in: bcache tcp_diag inet_diag ipmi_devintf ipmi_si ipmi_msghandler bonding 8021q garp stp llc ipv6 ext3 jbd loop sg iomemory_vsl(P) bnx2 microcode serio_raw i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext4 jbd2 mbcache megaraid_sas pata_acpi ata_generic ata_piix dm_mod [last unloaded: scsi_wait_scan] [ 276.465477] Pid: 2765, comm: sh Tainted: P W --------------- 2.6.32 #1 [ 276.482169] Call Trace: [ 276.498610] [<ffffffff81070fe7>] ? warn_slowpath_common+0x87/0xc0 [ 276.515405] [<ffffffff810710d6>] ? warn_slowpath_fmt+0x46/0x50 [ 276.532059] [<ffffffffa03bda3f>] ? bcache_device_link+0xdf/0x110 [bcache] [ 276.548808] [<ffffffffa03bfa08>] ? bch_cached_dev_attach+0x478/0x4f0 [bcache] [ 276.565569] [<ffffffffa03c4a17>] ? bch_cached_dev_store+0x627/0x780 [bcache] [ 276.582418] [<ffffffff8116783a>] ? alloc_pages_current+0xaa/0x110 [ 276.599341] [<ffffffff81203b15>] ? sysfs_write_file+0xe5/0x170 [ 276.616142] [<ffffffff811887b8>] ? vfs_write+0xb8/0x1a0 [ 276.632607] [<ffffffff811890b1>] ? sys_write+0x51/0x90 [ 276.648671] [<ffffffff8100c072>] ? system_call_fastpath+0x16/0x1b [ 276.664756] ---[ end trace 9f5d4fcdd0c3edfc ]--- We forget to clear BCACHE_DEV_UNLINK_DONE flag in bcache_device_attach() function when we attach a backing device first time. After detaching this backing device, this flag will be true and sysfs_remove_link() isn't called in bcache_device_unlink(). Then when we attach this backing device again, sysfs_create_link() will return EEXIST error in bcache_device_link(). So the fix is trival and we clear this flag in bcache_device_link(). Signed-off-by: Zheng Liu <wenqing.lz@xxxxxxxxxx> Tested-by: Joshua Schmid <jschmid@xxxxxxxx> --- drivers/md/bcache/super.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 4dd2bb7..f624ae8 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -708,6 +708,8 @@ static void bcache_device_link(struct bcache_device *d, struct cache_set *c, WARN(sysfs_create_link(&d->kobj, &c->kobj, "cache") || sysfs_create_link(&c->kobj, &d->kobj, d->name), "Couldn't create device <-> cache set symlinks"); + + clear_bit(BCACHE_DEV_UNLINK_DONE, &d->flags); } static void bcache_device_detach(struct bcache_device *d) -- 2.1.2