Re: bcache detach lead to xfs force shutdown

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 3/4/22 4:42 PM, Coly Li <colyli@xxxxxxx> wrote:
On 3/4/22 4:22 PM, Zhang Zhen wrote:
>
> On 3/2/22 5:19 PM, Coly Li wrote:
>> On 2/23/22 8:26 PM, Zhang Zhen wrote:
>>>
>>> 在 2022/2/23 下午5:03, Coly Li 写道:
>>>> On 2/21/22 5:33 PM, Zhang Zhen wrote:
>>>>> Hi coly,
>>>>>
>>>>> We encounted a bcache detach problem, during the io process,the >>>>> cache device become missing.
>>>>>
>>>>> The io error status returned to xfs, and in some case, the xfs >>>>> do force shutdown.
>>>>>
>>>>> The dmesg as follows:
>>>>> Feb  2 20:59:23  kernel: bcache: bch_count_io_errors() nvme0n1p56: >>>>> IO error on writing btree. >>>>> Feb  2 20:59:23  kernel: bcache: bch_count_io_errors() nvme0n1p44: >>>>> IO error on writing btree. >>>>> Feb  2 20:59:23  kernel: bcache: bch_count_io_errors() nvme0n1p44: >>>>> IO error on writing btree. >>>>> Feb  2 20:59:23  kernel: bcache: bch_count_io_errors() nvme0n1p57: >>>>> IO error on writing btree. >>>>> Feb  2 20:59:23  kernel: bcache: bch_count_io_errors() nvme0n1p56: >>>>> IO error on writing btree. >>>>> Feb  2 20:59:23  kernel: bcache: bch_count_io_errors() nvme0n1p57: >>>>> IO error on writing btree. >>>>> Feb  2 20:59:23  kernel: bcache: bch_count_io_errors() nvme0n1p56: >>>>> IO error on writing btree.
>>>>> Feb  2 20:59:23  kernel: bcache: bch_btree_insert() error -5
>>>>> Feb  2 20:59:23  kernel: XFS (bcache43): metadata I/O error in >>>>> "xfs_buf_iodone_callback_error" at daddr 0x80034658 len 32 error 12
>>>>> Feb  2 20:59:23  kernel: bcache: bch_btree_insert() error -5
>>>>> Feb  2 20:59:23  kernel: bcache: bch_btree_insert() error -5
>>>>> Feb  2 20:59:23  kernel: bcache: bch_btree_insert() error -5
>>>>> Feb  2 20:59:23  kernel: bcache: bch_btree_insert() error -5
>>>>> Feb  2 20:59:23  kernel: bcache: bch_cache_set_error() bcache: >>>>> error on 004f8aa7-561a-4ba7-bf7b-292e461d3f18:
>>>>> Feb  2 20:59:23  kernel: journal io error
>>>>> Feb  2 20:59:23  kernel: bcache: bch_cache_set_error() , disabling >>>>> caching
>>>>> Feb  2 20:59:23  kernel: bcache: bch_btree_insert() error -5
>>>>> Feb  2 20:59:23  kernel: bcache: conditional_stop_bcache_device() >>>>> stop_when_cache_set_failed of bcache43 is "auto" and cache is >>>>> clean, keep it alive. >>>>> Feb  2 20:59:23  kernel: XFS (bcache43): metadata I/O error in >>>>> "xlog_iodone" at daddr 0x400123e60 len 64 error 12 >>>>> Feb  2 20:59:23  kernel: XFS (bcache43): >>>>> xfs_do_force_shutdown(0x2) called from line 1298 of file >>>>> fs/xfs/xfs_log.c. Return address = 00000000c1c8077f >>>>> Feb  2 20:59:23  kernel: XFS (bcache43): Log I/O Error Detected. >>>>> Shutting down filesystem >>>>> Feb  2 20:59:23  kernel: XFS (bcache43): Please unmount the >>>>> filesystem and rectify the problem(s)
>>>>>
>>>>>
>>>>> We checked the code, the error status is returned in >>>>> cached_dev_make_request and closure_bio_submit function.
>>>>>
>>>>> 1180 static blk_qc_t cached_dev_make_request(struct request_queue *q,
>>>>> 1181                     struct bio *bio)
>>>>> 1182 {
>>>>> 1183     struct search *s;
>>>>> 1184     struct bcache_device *d = bio->bi_disk->private_data;
>>>>> 1185     struct cached_dev *dc = container_of(d, struct >>>>> cached_dev, disk);
>>>>> 1186     int rw = bio_data_dir(bio);
>>>>> 1187
>>>>> 1188     if (unlikely((d->c && test_bit(CACHE_SET_IO_DISABLE, >>>>> &d->c->flags)) ||
>>>>> 1189              dc->io_disable)) {
>>>>> 1190         bio->bi_status = BLK_STS_IOERR;
>>>>> 1191         bio_endio(bio);
>>>>> 1192         return BLK_QC_T_NONE;
>>>>> 1193     }
>>>>>
>>>>>  901 static inline void closure_bio_submit(struct cache_set *c,
>>>>>  902                       struct bio *bio,
>>>>>  903                       struct closure *cl)
>>>>>  904 {
>>>>>  905     closure_get(cl);
>>>>>  906     if (unlikely(test_bit(CACHE_SET_IO_DISABLE, &c->flags))) {
>>>>>  907         bio->bi_status = BLK_STS_IOERR;
>>>>>  908         bio_endio(bio);
>>>>>  909         return;
>>>>>  910     }
>>>>>  911     generic_make_request(bio);
>>>>>  912 }
>>>>>
>>>>> Can the cache set detached and don't return error status to fs?
>>>>
>>>>
>>>> Hi Zhang,
>>>>
>>>>
>>>> What is your kernel version and where do you get the kernel?
>>>> My kernel version is 4.18 of Centos.
>>> The code of this part is same with upstream kernel.
>>>> It seems like an as designed behavior, could you please describe >>>> more detail about the operation sequence?
>>>>
>>> Yes, i think so too.
>>> The reproduce opreation as follows:
>>> 1. mount a bcache disk with xfs
>>>
>>> /dev/bcache1 on /media/disk1 type xfs
>>>
>>> 2. run ls in background
>>> #!/bin/bash
>>>
>>> while true
>>> do
>>>   echo 2 > /proc/sys/vm/drop_caches
>>>   ls -R /media/disk1 > /dev/null
>>> done
>>>
>>>
>>> 3. remove cache disk sdc
>>> echo 1 >/sys/block/sdc/device/delete
>>>
>>> 4. dmesg should get xfs error
>>>
>>> I write a patch to improve,please help to review it, thanks.
>
>>
>> Hold on. Why do you think it should be fixed? As I said, it is >> as-designed behavior.
>>
> We use bcache in writearound mode, just cache read io.
> Currently, bcache return io error during detach, randomly lead to
> xfs force shutdown.
>
> After bcache auto detach finished, some dir read write normaly, but
> the others can't read write because of xfs force shutdown.
> This inconsistency confuses filesystem users.


Hi Zhen and Nix,

OK, I come to realize the motivation. Yes you are right, this is an awkward issue and good to be fixed.

Hi Coly,

So you will pick this patch into your tree ?

Coly Li





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux