Re: bcache detach lead to xfs force shutdown

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




在 2022/2/23 下午5:03, Coly Li 写道:
On 2/21/22 5:33 PM, Zhang Zhen wrote:
Hi coly,

We encounted a bcache detach problem, during the io process,the cache device become missing.

The io error status returned to xfs, and in some case, the xfs do force shutdown.

The dmesg as follows:
Feb  2 20:59:23  kernel: bcache: bch_count_io_errors() nvme0n1p56: IO error on writing btree. Feb  2 20:59:23  kernel: bcache: bch_count_io_errors() nvme0n1p44: IO error on writing btree. Feb  2 20:59:23  kernel: bcache: bch_count_io_errors() nvme0n1p44: IO error on writing btree. Feb  2 20:59:23  kernel: bcache: bch_count_io_errors() nvme0n1p57: IO error on writing btree. Feb  2 20:59:23  kernel: bcache: bch_count_io_errors() nvme0n1p56: IO error on writing btree. Feb  2 20:59:23  kernel: bcache: bch_count_io_errors() nvme0n1p57: IO error on writing btree. Feb  2 20:59:23  kernel: bcache: bch_count_io_errors() nvme0n1p56: IO error on writing btree.
Feb  2 20:59:23  kernel: bcache: bch_btree_insert() error -5
Feb  2 20:59:23  kernel: XFS (bcache43): metadata I/O error in "xfs_buf_iodone_callback_error" at daddr 0x80034658 len 32 error 12
Feb  2 20:59:23  kernel: bcache: bch_btree_insert() error -5
Feb  2 20:59:23  kernel: bcache: bch_btree_insert() error -5
Feb  2 20:59:23  kernel: bcache: bch_btree_insert() error -5
Feb  2 20:59:23  kernel: bcache: bch_btree_insert() error -5
Feb  2 20:59:23  kernel: bcache: bch_cache_set_error() bcache: error on 004f8aa7-561a-4ba7-bf7b-292e461d3f18:
Feb  2 20:59:23  kernel: journal io error
Feb  2 20:59:23  kernel: bcache: bch_cache_set_error() , disabling caching
Feb  2 20:59:23  kernel: bcache: bch_btree_insert() error -5
Feb  2 20:59:23  kernel: bcache: conditional_stop_bcache_device() stop_when_cache_set_failed of bcache43 is "auto" and cache is clean, keep it alive. Feb  2 20:59:23  kernel: XFS (bcache43): metadata I/O error in "xlog_iodone" at daddr 0x400123e60 len 64 error 12 Feb  2 20:59:23  kernel: XFS (bcache43): xfs_do_force_shutdown(0x2) called from line 1298 of file fs/xfs/xfs_log.c. Return address = 00000000c1c8077f Feb  2 20:59:23  kernel: XFS (bcache43): Log I/O Error Detected. Shutting down filesystem Feb  2 20:59:23  kernel: XFS (bcache43): Please unmount the filesystem and rectify the problem(s)


We checked the code, the error status is returned in cached_dev_make_request and closure_bio_submit function.

1180 static blk_qc_t cached_dev_make_request(struct request_queue *q,
1181                     struct bio *bio)
1182 {
1183     struct search *s;
1184     struct bcache_device *d = bio->bi_disk->private_data;
1185     struct cached_dev *dc = container_of(d, struct cached_dev, disk);
1186     int rw = bio_data_dir(bio);
1187
1188     if (unlikely((d->c && test_bit(CACHE_SET_IO_DISABLE, &d->c->flags)) ||
1189              dc->io_disable)) {
1190         bio->bi_status = BLK_STS_IOERR;
1191         bio_endio(bio);
1192         return BLK_QC_T_NONE;
1193     }

 901 static inline void closure_bio_submit(struct cache_set *c,
 902                       struct bio *bio,
 903                       struct closure *cl)
 904 {
 905     closure_get(cl);
 906     if (unlikely(test_bit(CACHE_SET_IO_DISABLE, &c->flags))) {
 907         bio->bi_status = BLK_STS_IOERR;
 908         bio_endio(bio);
 909         return;
 910     }
 911     generic_make_request(bio);
 912 }

Can the cache set detached and don't return error status to fs?


Hi Zhang,


What is your kernel version and where do you get the kernel?
My kernel version is 4.18 of Centos.
The code of this part is same with upstream kernel.
It seems like an as designed behavior, could you please describe more detail about the operation sequence?

Yes, i think so too.
The reproduce opreation as follows:
1. mount a bcache disk with xfs

/dev/bcache1 on /media/disk1 type xfs

2. run ls in background
#!/bin/bash

while true
do
  echo 2 > /proc/sys/vm/drop_caches
  ls -R /media/disk1 > /dev/null
done


3. remove cache disk sdc
echo 1 >/sys/block/sdc/device/delete

4. dmesg should get xfs error

I write a patch to improve,please help to review it, thanks.

Thanks.


Coly Li
From cb4dff3092707a31017cb3736be39039ece0e646 Mon Sep 17 00:00:00 2001
From: Zhen Zhang <zhangzhen.email@xxxxxxxxx>
Date: Wed, 23 Feb 2022 03:40:29 -0800
Subject: [PATCH] Bcache: don't return BLK_STS_IOERR during cache detach

Before this patch, if cache device missing, cached_dev_submit_bio return io err
to fs during cache detach, randomly lead to xfs do force shutdown.

This patch delay the cache io submit in cached_dev_submit_bio
and wait for cache set detach finish.
So if the cache device become missing, bcache detach cache set automatically,
and the io will sumbit normally.

Feb  2 20:59:23  kernel: bcache: bch_count_io_errors() nvme0n1p56: IO error on writing btree.
Feb  2 20:59:23  kernel: bcache: bch_count_io_errors() nvme0n1p57: IO error on writing btree.
Feb  2 20:59:23  kernel: bcache: bch_count_io_errors() nvme0n1p56: IO error on writing btree.
Feb  2 20:59:23  kernel: bcache: bch_btree_insert() error -5
Feb  2 20:59:23  kernel: XFS (bcache43): metadata I/O error in "xfs_buf_iodone_callback_error" at daddr 0x80034658 len 32 error 12
Feb  2 20:59:23  kernel: bcache: bch_btree_insert() error -5
Feb  2 20:59:23  kernel: bcache: bch_btree_insert() error -5
Feb  2 20:59:23  kernel: bcache: bch_btree_insert() error -5
Feb  2 20:59:23  kernel: bcache: bch_btree_insert() error -5
Feb  2 20:59:23  kernel: bcache: bch_cache_set_error() bcache: error on 004f8aa7-561a-4ba7-bf7b-292e461d3f18:
Feb  2 20:59:23  kernel: journal io error
Feb  2 20:59:23  kernel: bcache: bch_cache_set_error() , disabling caching
Feb  2 20:59:23  kernel: bcache: bch_btree_insert() error -5
Feb  2 20:59:23  kernel: bcache: conditional_stop_bcache_device() stop_when_cache_set_failed of bcache43 is "auto" and cache is clean, keep it alive.
Feb  2 20:59:23  kernel: XFS (bcache43): metadata I/O error in "xlog_iodone" at daddr 0x400123e60 len 64 error 12
Feb  2 20:59:23  kernel: XFS (bcache43): xfs_do_force_shutdown(0x2) called from line 1298 of file fs/xfs/xfs_log.c. Return address = 00000000c1c8077f
Feb  2 20:59:23  kernel: XFS (bcache43): Log I/O Error Detected. Shutting down filesystem
Feb  2 20:59:23  kernel: XFS (bcache43): Please unmount the filesystem and rectify the problem(s)

Signed-off-by: Zhen Zhang <zhangzhen.email@xxxxxxxxx>
---
 drivers/md/bcache/bcache.h  | 5 -----
 drivers/md/bcache/request.c | 8 ++++----
 drivers/md/bcache/super.c   | 3 ++-
 3 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 9ed9c955add7..e5227dd08e3a 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -928,11 +928,6 @@ static inline void closure_bio_submit(struct cache_set *c,
 				      struct closure *cl)
 {
 	closure_get(cl);
-	if (unlikely(test_bit(CACHE_SET_IO_DISABLE, &c->flags))) {
-		bio->bi_status = BLK_STS_IOERR;
-		bio_endio(bio);
-		return;
-	}
 	submit_bio_noacct(bio);
 }
 
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index d15aae6c51c1..36f0ee95b51f 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -13,6 +13,7 @@
 #include "request.h"
 #include "writeback.h"
 
+#include <linux/delay.h>
 #include <linux/module.h>
 #include <linux/hash.h>
 #include <linux/random.h>
@@ -1172,11 +1173,10 @@ void cached_dev_submit_bio(struct bio *bio)
 	unsigned long start_time;
 	int rw = bio_data_dir(bio);
 
-	if (unlikely((d->c && test_bit(CACHE_SET_IO_DISABLE, &d->c->flags)) ||
+	while (unlikely((d->c && test_bit(CACHE_SET_IO_DISABLE, &d->c->flags)) ||
 		     dc->io_disable)) {
-		bio->bi_status = BLK_STS_IOERR;
-		bio_endio(bio);
-		return;
+		/* wait for detach finish and d->c == NULL. */
+		msleep(2);
 	}
 
 	if (likely(d->c)) {
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 140f35dc0c45..8d9a5e937bc8 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -661,7 +661,8 @@ int bch_prio_write(struct cache *ca, bool wait)
 		p->csum		= bch_crc64(&p->magic, meta_bucket_bytes(&ca->sb) - 8);
 
 		bucket = bch_bucket_alloc(ca, RESERVE_PRIO, wait);
-		BUG_ON(bucket == -1);
+		if (bucket == -1)
+			return -1;
 
 		mutex_unlock(&ca->set->bucket_lock);
 		prio_io(ca, bucket, REQ_OP_WRITE, 0);
-- 
2.25.1


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux