Re: [BUG] bcache_writebac blocked, IO on bcache device hung

Eric Wheeler <bcache@xxxxxxxxxxxxxxxxxx> · Mon, 11 Apr 2016 01:55:05 +0000 (UTC)

On Sun, 10 Apr 2016, Sebastian Roesner wrote:

> Hello,
> 
> I had an issue with bcache and kernel 4.5.0. I'm not sure that it was purely
> bcache related, but IO on the bcache device didn't work anymore whereas other
> volumes still worked fine.
> 
> After bcache blocked, it showed up the same message for dmcrypt_write. On top
> of the bcache device I run LVM and encrypt its LVs.
> 
> 
> └─sda2                                              8:34    part
>   └─md1                                             9:1     raid1
>     └─bcache0                                     252:0     disk
>       ├─storage-XXXXXXXXXXXXX_crypt               253:0     lvm
>       │ └─XXXXXXXXXXXX                            253:137   crypt
>       ├─storage-XXXXXXXXXXXX_crypt                253:1     lvm
>       │ └─XXXXXXXXXXXX                            253:122   crypt
> [..]
> 
> Bcache was patched with the patches from
> 
> https://bitbucket.org/ewheelerinc/linux v4.5-rc6-bcache-fixes
> https://bitbucket.org/ewheelerinc/linux v4.5-rc7-bcache-fixes
> 
> Trace:
> 
> INFO: task bcache_writebac:10061 blocked for more than 120 seconds.
>       Not tainted 4.5.0-kvmhost #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> bcache_writebac D ffff88081fc94440     0 10061      2 0x00000000
>  ffff8807f83d2400 ffff8807fc2ae180 000000020000a07c ffffffff810036a4
>  ffff8800cb444000 ffffffff00000052 ffff8807f83d2400 ffff8807f7df0bc0
>  0000000000000000 ffff8807f7df0000 ffffffff81410370 ffff8807f7df0ad8
> Call Trace:
>  [<ffffffff810036a4>] ? __switch_to+0x1c8/0x36e
>  [<ffffffff81410370>] ? schedule+0x7a/0x87
>  [<ffffffff81411c6c>] ? rwsem_down_write_failed+0x241/0x2b0
>  [<ffffffff81218763>] ? call_rwsem_down_write_failed+0x13/0x20
>  [<ffffffff81411539>] ? down_write+0x24/0x33
>  [<ffffffffa044ccac>] ? bch_writeback_thread+0x48/0x6bc [bcache]
>  [<ffffffffa044cc64>] ? write_dirty_finish+0x1d4/0x1d4 [bcache]
>  [<ffffffff8105efa2>] ? kthread+0x99/0xa1
>  [<ffffffff8105ef09>] ? kthread_parkme+0x16/0x16
>  [<ffffffff814129df>] ? ret_from_fork+0x3f/0x70
>  [<ffffffff8105ef09>] ? kthread_parkme+0x16/0x16

Please recompile with the lockdep debugging options noted here:
  
http://stackoverflow.com/questions/20892822/how-to-use-lockdep-feature-in-linux-kernel-for-deadlock-detection


Also, try this patch from Ming Lei and let us know if that solves it:

Fixes: 54efd50(block: make generic_make_request handle arbitrarily sized bios)
Reported-by: Sebastian Roesner <sroesner-kernelorg@xxxxxxxxxxxxxxxxx>
Reported-by: Eric Wheeler <bcache@xxxxxxxxxxxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx (4.3+)
Cc: Shaohua Li <shli@xxxxxx>
Cc: Kent Overstreet <kent.overstreet@xxxxxxxxx>
Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxxxxx>
---
V1:
	- Kent pointed out that using max io size can't cover
	the case of non-full bvecs/pages

The issue can be reproduced by the following approach:
        - create one raid1 over two virtio-blk
        - build bcache device over the above raid1 and another cache device
	and bucket size is set 2Mbytes
        - set cache mode as writeback
        - run random write over ext4 on the bcache device
        - then the crash can be triggered

 block/blk-merge.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 2613531..7b96471 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -94,8 +94,10 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 	bool do_split = true;
 	struct bio *new = NULL;
 	const unsigned max_sectors = get_max_io_size(q, bio);
+	unsigned bvecs = 0;
 
 	bio_for_each_segment(bv, bio, iter) {
+		bvecs++;
 		/*
 		 * If the queue doesn't support SG gaps and adding this
 		 * offset would create a gap, disallow it.
@@ -103,6 +105,23 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 		if (bvprvp && bvec_gap_to_prev(q, bvprvp, bv.bv_offset))
 			goto split;
 
+		/*
+		 * With arbitrary bio size, the incoming bio may be very
+		 * big. We have to split the bio into small bios so that
+		 * each holds at most BIO_MAX_PAGES bvecs because
+		 * bio_clone() can fail to allocate big bvecs.
+		 *
+		 * It should have been better to apply the limit per
+		 * request queue in which bio_clone() is involved,
+		 * instead of globally. The biggest blocker is
+		 * bio_clone() in bio bounce.
+		 *
+		 * TODO: deal with bio bounce's bio_clone() gracefully
+		 * and convert the global limit into per-queue limit.
+		 */
+		if (bvecs >= BIO_MAX_PAGES)
+			goto split;
+
 		if (sectors + (bv.bv_len >> 9) > max_sectors) {
 			/*
 			 * Consider this a new segment if we're splitting in
-- 
1.9.1




--
Eric Wheeler


> INFO: task dmcrypt_write:11119 blocked for more than 120 seconds.
>       Not tainted 4.5.0-kvmhost #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> dmcrypt_write   D ffff88081fc54440     0 11119      2 0x00000000
>  ffff8807ebe0ce00 ffff8807fc2aee80 ffff8807f929eac0 0000000002011200
>  ffff8800ce50c000 ffff8807ebe0ce00 0000000000000000 ffff8807bcd33020
>  0000000000000001 0000000000000001 ffffffff81410370 ffff8807f7df0ad8
> Call Trace:
>  [<ffffffff81410370>] ? schedule+0x7a/0x87
>  [<ffffffff81411a15>] ? rwsem_down_read_failed+0xc6/0xdc
>  [<ffffffff810e71c8>] ? mempool_alloc+0x61/0x12d
>  [<ffffffff81218734>] ? call_rwsem_down_read_failed+0x14/0x30
>  [<ffffffff81411513>] ? down_read+0x17/0x19
>  [<ffffffffa04441d9>] ? cached_dev_make_request+0x411/0x738 [bcache]
>  [<ffffffff811eb860>] ? generic_make_request+0xb5/0x155
>  [<ffffffffa0314917>] ? dmcrypt_write+0x131/0x160 [dm_crypt]
>  [<ffffffff8106674c>] ? try_to_wake_up+0x1b5/0x1b5
>  [<ffffffffa03147e6>] ? crypt_iv_benbi_gen+0x37/0x37 [dm_crypt]
>  [<ffffffff8105efa2>] ? kthread+0x99/0xa1
>  [<ffffffff8105ef09>] ? kthread_parkme+0x16/0x16
>  [<ffffffff814129df>] ? ret_from_fork+0x3f/0x70
>  [<ffffffff8105ef09>] ? kthread_parkme+0x16/0x16
> INFO: task dmcrypt_write:11609 blocked for more than 120 seconds.
>       Not tainted 4.5.0-kvmhost #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> dmcrypt_write   D ffff88081fc54440     0 11609      2 0x00000000
>  ffff8807f04e4e80 ffff8807fc2aee80 ffff8807f929eac0 0000000002011200
>  ffff8807eb2b4000 ffff8807f04e4e80 0000000000000000 ffff8807bab51020
>  0000000000000001 0000000000000001 ffffffff81410370 ffff8807f7df0ad8
> Call Trace:
>  [<ffffffff81410370>] ? schedule+0x7a/0x87
>  [<ffffffff81411a15>] ? rwsem_down_read_failed+0xc6/0xdc
>  [<ffffffff810e71c8>] ? mempool_alloc+0x61/0x12d
>  [<ffffffff81218734>] ? call_rwsem_down_read_failed+0x14/0x30
>  [<ffffffff81411513>] ? down_read+0x17/0x19
>  [<ffffffffa04441d9>] ? cached_dev_make_request+0x411/0x738 [bcache]
>  [<ffffffff811eb860>] ? generic_make_request+0xb5/0x155
>  [<ffffffffa0314917>] ? dmcrypt_write+0x131/0x160 [dm_crypt]
>  [<ffffffff8106674c>] ? try_to_wake_up+0x1b5/0x1b5
>  [<ffffffffa03147e6>] ? crypt_iv_benbi_gen+0x37/0x37 [dm_crypt]
>  [<ffffffff8105efa2>] ? kthread+0x99/0xa1
>  [<ffffffff8105ef09>] ? kthread_parkme+0x16/0x16
>  [<ffffffff814129df>] ? ret_from_fork+0x3f/0x70
>  [<ffffffff8105ef09>] ? kthread_parkme+0x16/0x16
> 
> Sebastian
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>