Before this patch, if cache device missing, cached_dev_submit_bio return io err to fs during cache detach, randomly lead to xfs do force shutdown. This patch delay the cache io submit in cached_dev_submit_bio and wait for cache set detach finish. So if the cache device become missing, bcache detach cache set automatically, and the io will sumbit normally. Feb 2 20:59:23 kernel: bcache: bch_count_io_errors() nvme0n1p56: IO error on writing btree. Feb 2 20:59:23 kernel: bcache: bch_count_io_errors() nvme0n1p57: IO error on writing btree. Feb 2 20:59:23 kernel: bcache: bch_count_io_errors() nvme0n1p56: IO error on writing btree. Feb 2 20:59:23 kernel: bcache: bch_btree_insert() error -5 Feb 2 20:59:23 kernel: XFS (bcache43): metadata I/O error in "xfs_buf_iodone_callback_error" at daddr 0x80034658 len 32 error 12 Feb 2 20:59:23 kernel: bcache: bch_btree_insert() error -5 Feb 2 20:59:23 kernel: bcache: bch_btree_insert() error -5 Feb 2 20:59:23 kernel: bcache: bch_btree_insert() error -5 Feb 2 20:59:23 kernel: bcache: bch_btree_insert() error -5 Feb 2 20:59:23 kernel: bcache: bch_cache_set_error() bcache: error on 004f8aa7-561a-4ba7-bf7b-292e461d3f18: Feb 2 20:59:23 kernel: journal io error Feb 2 20:59:23 kernel: bcache: bch_cache_set_error() , disabling caching Feb 2 20:59:23 kernel: bcache: bch_btree_insert() error -5 Feb 2 20:59:23 kernel: bcache: conditional_stop_bcache_device() stop_when_cache_set_failed of bcache43 is "auto" and cache is clean, keep it alive. Feb 2 20:59:23 kernel: XFS (bcache43): metadata I/O error in "xlog_iodone" at daddr 0x400123e60 len 64 error 12 Feb 2 20:59:23 kernel: XFS (bcache43): xfs_do_force_shutdown(0x2) called from line 1298 of file fs/xfs/xfs_log.c. Return address = 00000000c1c8077f Feb 2 20:59:23 kernel: XFS (bcache43): Log I/O Error Detected. Shutting down filesystem Feb 2 20:59:23 kernel: XFS (bcache43): Please unmount the filesystem and rectify the problem(s) Signed-off-by: Zhen Zhang <zhangzhen.email@xxxxxxxxx> --- drivers/md/bcache/bcache.h | 5 ----- drivers/md/bcache/request.c | 8 ++++---- drivers/md/bcache/super.c | 3 ++- 3 files changed, 6 insertions(+), 10 deletions(-) diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h index 9ed9c955add7..e5227dd08e3a 100644 --- a/drivers/md/bcache/bcache.h +++ b/drivers/md/bcache/bcache.h @@ -928,11 +928,6 @@ static inline void closure_bio_submit(struct cache_set *c, struct closure *cl) { closure_get(cl); - if (unlikely(test_bit(CACHE_SET_IO_DISABLE, &c->flags))) { - bio->bi_status = BLK_STS_IOERR; - bio_endio(bio); - return; - } submit_bio_noacct(bio); } diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c index d15aae6c51c1..36f0ee95b51f 100644 --- a/drivers/md/bcache/request.c +++ b/drivers/md/bcache/request.c @@ -13,6 +13,7 @@ #include "request.h" #include "writeback.h" +#include <linux/delay.h> #include <linux/module.h> #include <linux/hash.h> #include <linux/random.h> @@ -1172,11 +1173,10 @@ void cached_dev_submit_bio(struct bio *bio) unsigned long start_time; int rw = bio_data_dir(bio); - if (unlikely((d->c && test_bit(CACHE_SET_IO_DISABLE, &d->c->flags)) || + while (unlikely((d->c && test_bit(CACHE_SET_IO_DISABLE, &d->c->flags)) || dc->io_disable)) { - bio->bi_status = BLK_STS_IOERR; - bio_endio(bio); - return; + /* wait for detach finish and d->c == NULL. */ + msleep(2); } if (likely(d->c)) { diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 140f35dc0c45..8d9a5e937bc8 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -661,7 +661,8 @@ int bch_prio_write(struct cache *ca, bool wait) p->csum = bch_crc64(&p->magic, meta_bucket_bytes(&ca->sb) - 8); bucket = bch_bucket_alloc(ca, RESERVE_PRIO, wait); - BUG_ON(bucket == -1); + if (bucket == -1) + return -1; mutex_unlock(&ca->set->bucket_lock); prio_io(ca, bucket, REQ_OP_WRITE, 0); -- 2.25.1