On 3/4/22 4:42 PM, Coly Li <colyli@xxxxxxx> wrote:
On 3/4/22 4:22 PM, Zhang Zhen wrote:
>
> On 3/2/22 5:19 PM, Coly Li wrote:
>> On 2/23/22 8:26 PM, Zhang Zhen wrote:
>>>
>>> 在 2022/2/23 下午5:03, Coly Li 写道:
>>>> On 2/21/22 5:33 PM, Zhang Zhen wrote:
>>>>> Hi coly,
>>>>>
>>>>> We encounted a bcache detach problem, during the io process,the
>>>>> cache device become missing.
>>>>>
>>>>> The io error status returned to xfs, and in some case, the xfs
>>>>> do force shutdown.
>>>>>
>>>>> The dmesg as follows:
>>>>> Feb 2 20:59:23 kernel: bcache: bch_count_io_errors() nvme0n1p56:
>>>>> IO error on writing btree.
>>>>> Feb 2 20:59:23 kernel: bcache: bch_count_io_errors() nvme0n1p44:
>>>>> IO error on writing btree.
>>>>> Feb 2 20:59:23 kernel: bcache: bch_count_io_errors() nvme0n1p44:
>>>>> IO error on writing btree.
>>>>> Feb 2 20:59:23 kernel: bcache: bch_count_io_errors() nvme0n1p57:
>>>>> IO error on writing btree.
>>>>> Feb 2 20:59:23 kernel: bcache: bch_count_io_errors() nvme0n1p56:
>>>>> IO error on writing btree.
>>>>> Feb 2 20:59:23 kernel: bcache: bch_count_io_errors() nvme0n1p57:
>>>>> IO error on writing btree.
>>>>> Feb 2 20:59:23 kernel: bcache: bch_count_io_errors() nvme0n1p56:
>>>>> IO error on writing btree.
>>>>> Feb 2 20:59:23 kernel: bcache: bch_btree_insert() error -5
>>>>> Feb 2 20:59:23 kernel: XFS (bcache43): metadata I/O error in
>>>>> "xfs_buf_iodone_callback_error" at daddr 0x80034658 len 32 error 12
>>>>> Feb 2 20:59:23 kernel: bcache: bch_btree_insert() error -5
>>>>> Feb 2 20:59:23 kernel: bcache: bch_btree_insert() error -5
>>>>> Feb 2 20:59:23 kernel: bcache: bch_btree_insert() error -5
>>>>> Feb 2 20:59:23 kernel: bcache: bch_btree_insert() error -5
>>>>> Feb 2 20:59:23 kernel: bcache: bch_cache_set_error() bcache:
>>>>> error on 004f8aa7-561a-4ba7-bf7b-292e461d3f18:
>>>>> Feb 2 20:59:23 kernel: journal io error
>>>>> Feb 2 20:59:23 kernel: bcache: bch_cache_set_error() , disabling
>>>>> caching
>>>>> Feb 2 20:59:23 kernel: bcache: bch_btree_insert() error -5
>>>>> Feb 2 20:59:23 kernel: bcache: conditional_stop_bcache_device()
>>>>> stop_when_cache_set_failed of bcache43 is "auto" and cache is
>>>>> clean, keep it alive.
>>>>> Feb 2 20:59:23 kernel: XFS (bcache43): metadata I/O error in
>>>>> "xlog_iodone" at daddr 0x400123e60 len 64 error 12
>>>>> Feb 2 20:59:23 kernel: XFS (bcache43):
>>>>> xfs_do_force_shutdown(0x2) called from line 1298 of file
>>>>> fs/xfs/xfs_log.c. Return address = 00000000c1c8077f
>>>>> Feb 2 20:59:23 kernel: XFS (bcache43): Log I/O Error Detected.
>>>>> Shutting down filesystem
>>>>> Feb 2 20:59:23 kernel: XFS (bcache43): Please unmount the
>>>>> filesystem and rectify the problem(s)
>>>>>
>>>>>
>>>>> We checked the code, the error status is returned in
>>>>> cached_dev_make_request and closure_bio_submit function.
>>>>>
>>>>> 1180 static blk_qc_t cached_dev_make_request(struct request_queue *q,
>>>>> 1181 struct bio *bio)
>>>>> 1182 {
>>>>> 1183 struct search *s;
>>>>> 1184 struct bcache_device *d = bio->bi_disk->private_data;
>>>>> 1185 struct cached_dev *dc = container_of(d, struct
>>>>> cached_dev, disk);
>>>>> 1186 int rw = bio_data_dir(bio);
>>>>> 1187
>>>>> 1188 if (unlikely((d->c && test_bit(CACHE_SET_IO_DISABLE,
>>>>> &d->c->flags)) ||
>>>>> 1189 dc->io_disable)) {
>>>>> 1190 bio->bi_status = BLK_STS_IOERR;
>>>>> 1191 bio_endio(bio);
>>>>> 1192 return BLK_QC_T_NONE;
>>>>> 1193 }
>>>>>
>>>>> 901 static inline void closure_bio_submit(struct cache_set *c,
>>>>> 902 struct bio *bio,
>>>>> 903 struct closure *cl)
>>>>> 904 {
>>>>> 905 closure_get(cl);
>>>>> 906 if (unlikely(test_bit(CACHE_SET_IO_DISABLE, &c->flags))) {
>>>>> 907 bio->bi_status = BLK_STS_IOERR;
>>>>> 908 bio_endio(bio);
>>>>> 909 return;
>>>>> 910 }
>>>>> 911 generic_make_request(bio);
>>>>> 912 }
>>>>>
>>>>> Can the cache set detached and don't return error status to fs?
>>>>
>>>>
>>>> Hi Zhang,
>>>>
>>>>
>>>> What is your kernel version and where do you get the kernel?
>>>> My kernel version is 4.18 of Centos.
>>> The code of this part is same with upstream kernel.
>>>> It seems like an as designed behavior, could you please describe
>>>> more detail about the operation sequence?
>>>>
>>> Yes, i think so too.
>>> The reproduce opreation as follows:
>>> 1. mount a bcache disk with xfs
>>>
>>> /dev/bcache1 on /media/disk1 type xfs
>>>
>>> 2. run ls in background
>>> #!/bin/bash
>>>
>>> while true
>>> do
>>> echo 2 > /proc/sys/vm/drop_caches
>>> ls -R /media/disk1 > /dev/null
>>> done
>>>
>>>
>>> 3. remove cache disk sdc
>>> echo 1 >/sys/block/sdc/device/delete
>>>
>>> 4. dmesg should get xfs error
>>>
>>> I write a patch to improve,please help to review it, thanks.
>
>>
>> Hold on. Why do you think it should be fixed? As I said, it is
>> as-designed behavior.
>>
> We use bcache in writearound mode, just cache read io.
> Currently, bcache return io error during detach, randomly lead to
> xfs force shutdown.
>
> After bcache auto detach finished, some dir read write normaly, but
> the others can't read write because of xfs force shutdown.
> This inconsistency confuses filesystem users.
Hi Zhen and Nix,
OK, I come to realize the motivation. Yes you are right, this is an
awkward issue and good to be fixed.
Hi Coly,
So you will pick this patch into your tree ?
Coly Li