On 3/4/22 4:22 PM, Zhang Zhen wrote:
On 3/2/22 5:19 PM, Coly Li wrote:
On 2/23/22 8:26 PM, Zhang Zhen wrote:
在 2022/2/23 下午5:03, Coly Li 写道:
On 2/21/22 5:33 PM, Zhang Zhen wrote:
Hi coly,
We encounted a bcache detach problem, during the io process,the
cache device become missing.
The io error status returned to xfs, and in some case, the xfs do
force shutdown.
The dmesg as follows:
Feb 2 20:59:23 kernel: bcache: bch_count_io_errors() nvme0n1p56:
IO error on writing btree.
Feb 2 20:59:23 kernel: bcache: bch_count_io_errors() nvme0n1p44:
IO error on writing btree.
Feb 2 20:59:23 kernel: bcache: bch_count_io_errors() nvme0n1p44:
IO error on writing btree.
Feb 2 20:59:23 kernel: bcache: bch_count_io_errors() nvme0n1p57:
IO error on writing btree.
Feb 2 20:59:23 kernel: bcache: bch_count_io_errors() nvme0n1p56:
IO error on writing btree.
Feb 2 20:59:23 kernel: bcache: bch_count_io_errors() nvme0n1p57:
IO error on writing btree.
Feb 2 20:59:23 kernel: bcache: bch_count_io_errors() nvme0n1p56:
IO error on writing btree.
Feb 2 20:59:23 kernel: bcache: bch_btree_insert() error -5
Feb 2 20:59:23 kernel: XFS (bcache43): metadata I/O error in
"xfs_buf_iodone_callback_error" at daddr 0x80034658 len 32 error 12
Feb 2 20:59:23 kernel: bcache: bch_btree_insert() error -5
Feb 2 20:59:23 kernel: bcache: bch_btree_insert() error -5
Feb 2 20:59:23 kernel: bcache: bch_btree_insert() error -5
Feb 2 20:59:23 kernel: bcache: bch_btree_insert() error -5
Feb 2 20:59:23 kernel: bcache: bch_cache_set_error() bcache:
error on 004f8aa7-561a-4ba7-bf7b-292e461d3f18:
Feb 2 20:59:23 kernel: journal io error
Feb 2 20:59:23 kernel: bcache: bch_cache_set_error() , disabling
caching
Feb 2 20:59:23 kernel: bcache: bch_btree_insert() error -5
Feb 2 20:59:23 kernel: bcache: conditional_stop_bcache_device()
stop_when_cache_set_failed of bcache43 is "auto" and cache is
clean, keep it alive.
Feb 2 20:59:23 kernel: XFS (bcache43): metadata I/O error in
"xlog_iodone" at daddr 0x400123e60 len 64 error 12
Feb 2 20:59:23 kernel: XFS (bcache43):
xfs_do_force_shutdown(0x2) called from line 1298 of file
fs/xfs/xfs_log.c. Return address = 00000000c1c8077f
Feb 2 20:59:23 kernel: XFS (bcache43): Log I/O Error Detected.
Shutting down filesystem
Feb 2 20:59:23 kernel: XFS (bcache43): Please unmount the
filesystem and rectify the problem(s)
We checked the code, the error status is returned in
cached_dev_make_request and closure_bio_submit function.
1180 static blk_qc_t cached_dev_make_request(struct request_queue *q,
1181 struct bio *bio)
1182 {
1183 struct search *s;
1184 struct bcache_device *d = bio->bi_disk->private_data;
1185 struct cached_dev *dc = container_of(d, struct
cached_dev, disk);
1186 int rw = bio_data_dir(bio);
1187
1188 if (unlikely((d->c && test_bit(CACHE_SET_IO_DISABLE,
&d->c->flags)) ||
1189 dc->io_disable)) {
1190 bio->bi_status = BLK_STS_IOERR;
1191 bio_endio(bio);
1192 return BLK_QC_T_NONE;
1193 }
901 static inline void closure_bio_submit(struct cache_set *c,
902 struct bio *bio,
903 struct closure *cl)
904 {
905 closure_get(cl);
906 if (unlikely(test_bit(CACHE_SET_IO_DISABLE, &c->flags))) {
907 bio->bi_status = BLK_STS_IOERR;
908 bio_endio(bio);
909 return;
910 }
911 generic_make_request(bio);
912 }
Can the cache set detached and don't return error status to fs?
Hi Zhang,
What is your kernel version and where do you get the kernel?
My kernel version is 4.18 of Centos.
The code of this part is same with upstream kernel.
It seems like an as designed behavior, could you please describe
more detail about the operation sequence?
Yes, i think so too.
The reproduce opreation as follows:
1. mount a bcache disk with xfs
/dev/bcache1 on /media/disk1 type xfs
2. run ls in background
#!/bin/bash
while true
do
echo 2 > /proc/sys/vm/drop_caches
ls -R /media/disk1 > /dev/null
done
3. remove cache disk sdc
echo 1 >/sys/block/sdc/device/delete
4. dmesg should get xfs error
I write a patch to improve,please help to review it, thanks.
Hold on. Why do you think it should be fixed? As I said, it is
as-designed behavior.
We use bcache in writearound mode, just cache read io.
Currently, bcache return io error during detach, randomly lead to
xfs force shutdown.
After bcache auto detach finished, some dir read write normaly, but
the others can't read write because of xfs force shutdown.
This inconsistency confuses filesystem users.
Hi Zhen and Nix,
OK, I come to realize the motivation. Yes you are right, this is an
awkward issue and good to be fixed.
Coly Li