Re: BUG: bcache failing on top of degraded RAID-6

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/27/19 12:53 PM, Coly Li wrote:
> On 2019/3/27 7:00 下午, Thorsten Knabe wrote:
>> On 3/27/19 10:44 AM, Coly Li wrote:
>>> On 2019/3/26 9:21 下午, Thorsten Knabe wrote:
>>>> Hello,
>>>>
>>>> there seems to be a serious problem, when running bcache on top of a
>>>> degraded RAID-6 (MD) array. The bcache device /dev/bcache0 disappears
>>>> after a few I/O operations on the affected device and the kernel log
>>>> gets filled with the following log message:
>>>>
>>>> bcache: bch_count_backing_io_errors() md0: IO error on backing device,
>>>> unrecoverable
>>>>
>>>
>>> It seems I/O request onto backing device failed. If the md raid6 device
>>> is the backing device, does it go into read-only mode after degrade ?
>>
>> No, the RAID6 backing device is still in read-write mode after the disk
>> has been removed from the RAID array. That's the way RAID6 is supposed
>> to work.
>>
>>>
>>>
>>>> Setup:
>>>> Linux kernel: 5.1-rc2, 5.0.4, 4.19.0-0.bpo.2-amd64 (Debian backports)
>>>> all affected
>>>> bcache backing device: EXT4 filesystem -> /dev/bcache0 -> /dev/md0 ->
>>>> /dev/sd[bcde]1
>>>> bcache cache device: /dev/sdf1
>>>> cache mode: writethrough, none and cache device detached are all
>>>> affected, writeback and writearound has not been tested
>>>> KVM for testing, first observed on real hardware (failing RAID device)
>>>>
>>>> As long as the RAID6 is healthy, bcache works as expected. Once the
>>>> RAID6 gets degraded, for example by removing a drive from the array
>>>> (mdadm --fail /dev/md0 /dev/sde1, mdadm --remove /dev/md0 /dev/sde1),
>>>> the above-mentioned log messages appear in the kernel log and the bcache
>>>> device /dev/bcache0 disappears shortly afterwards logging:
>>>>
>>>> bcache: bch_cached_dev_error() stop bcache0: too many IO errors on
>>>> backing device md0
>>>>
>>>> to the kernel log.
>>>>
>>>> Increasing /sys/block/bcache0/bcache/io_error_limit to a very high value
>>>> (1073741824) the bcache device /dev/bcache0 remains usable without any
>>>> noticeable filesystem corruptions.
>>>
>>> If the backing device goes into read-only mode, bcache will take this
>>> backing device as a failure status. The behavior is to stop the bcache
>>> device of the failed backing device, to notify upper layer something
>>> goes wrong.
>>>
>>> In writethough and writeback mode, bcache requires the backing device to
>>> be writable.
>>
>> But, the degraded (one disk of the array missing) RAID6 device is still
>> writable.
>>
>> Also after raising the io_error_limit of the bcache device to a very
>> high value (1073741824 in my tests) I can use the bcache device on the
>> degraded RAID6 array for hours reading and writing gigabytes of data,
>> without getting any I/O errors or observing any filesystem corruptions.
>> I'm just getting a lot of those
>>
>> bcache: bch_count_backing_io_errors() md0: IO error on backing device,
>> unrecoverable
>>
>> messages in the kernel log.
>>
>> It seems that I/O requests for data that have been successfully
>> recovered by the RAID6 from the redundant information stored on the
>> additional disks are accidentally counted as failed I/O requests and
>> when the configured io_error_limit for the bcache device is reached, the
>> bcache device gets stopped.
> Oh, thanks for the informaiton.
> 
> It sounds during md raid6 degrading and recovering, some I/O from bcache
> might be failed, and after md raid6 degrades and recovers, the md device
> continue to serve I/O request. Am I right ?
> 

I think, the I/O errors logged by bcache are not real I/O errors,
because the filesystem on top of the bcache device does not report any
I/O errors unless the bcache device gets stopped by bcache due to too
many errors (io_error_limit reached).

I performed the following test:

Starting with bcache on a healthy RAID6 with 4 disks (all attached and
completely synced). cache_mode set to "none" to ensure data is read from
the backing device. EXT4 filesystem on top of bcache mounted with two
identical directories each containing 4GB of data on a system with 2GB
of RAM to ensure data is not coming form the page cache. "diff -r dir1
dir2" running in a loop to check for inconsistencies. Also
io_error_limit has been raised to 1073741824 to ensure the bcache device
does not get stopped due to too many io errors during the test.

As long as all 4 disks attached to the RAID6 array, no messages get logged.

Once one disk is removed from the RAID6 array using
  mdadm --fail /dev/md0 /dev/sde1
the kernel log gets filled with the

bcache: bch_count_backing_io_errors() md0: IO error on backing device,
unrecoverable

messages. However neither the EXT4 filesystem logs any corruptions nor
does the diff comparing the two directories report any inconsistencies.

Adding the previously removed disk back to the RAID6 array, bcache stops
reporting the above-mentioned error message once the re-added disk is
fully synced and the RAID6 array is healthy again.

If the I/O requests to the RAID6 device would actually fail, I would
expect to see either EXT4 filesystem errors in the logs or at least diff
reporting differences, but nothing gets logged in the kernel log expect
the above-mentioned message from bcache.

It seems bcache mistakenly classifies or at least counts some I/O
requests as failed although they have not actually failed.

By the way Linux 4.9 (from Debian stable) is most probably not affected.

Thanks
Thorsten





-- 
___
 |        | /                 E-Mail: linux@xxxxxxxxxxxxxxxxx
 |horsten |/\nabe                WWW: http://linux.thorsten-knabe.de



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux