Re: BUG: bcache failing on top of degraded RAID-6

Coly Li <colyli@xxxxxxx> · Wed, 27 Mar 2019 21:45:14 +0800

On 2019/3/27 9:42 下午, Thorsten Knabe wrote:
> On 3/27/19 12:53 PM, Coly Li wrote:
>> On 2019/3/27 7:00 下午, Thorsten Knabe wrote:
>>> On 3/27/19 10:44 AM, Coly Li wrote:
>>>> On 2019/3/26 9:21 下午, Thorsten Knabe wrote:
>>>>> Hello,
>>>>>
>>>>> there seems to be a serious problem, when running bcache on top of a
>>>>> degraded RAID-6 (MD) array. The bcache device /dev/bcache0 disappears
>>>>> after a few I/O operations on the affected device and the kernel log
>>>>> gets filled with the following log message:
>>>>>
>>>>> bcache: bch_count_backing_io_errors() md0: IO error on backing device,
>>>>> unrecoverable
>>>>>
>>>>
>>>> It seems I/O request onto backing device failed. If the md raid6 device
>>>> is the backing device, does it go into read-only mode after degrade ?
>>>
>>> No, the RAID6 backing device is still in read-write mode after the disk
>>> has been removed from the RAID array. That's the way RAID6 is supposed
>>> to work.
>>>
>>>>
>>>>
>>>>> Setup:
>>>>> Linux kernel: 5.1-rc2, 5.0.4, 4.19.0-0.bpo.2-amd64 (Debian backports)
>>>>> all affected
>>>>> bcache backing device: EXT4 filesystem -> /dev/bcache0 -> /dev/md0 ->
>>>>> /dev/sd[bcde]1
>>>>> bcache cache device: /dev/sdf1
>>>>> cache mode: writethrough, none and cache device detached are all
>>>>> affected, writeback and writearound has not been tested
>>>>> KVM for testing, first observed on real hardware (failing RAID device)
>>>>>
>>>>> As long as the RAID6 is healthy, bcache works as expected. Once the
>>>>> RAID6 gets degraded, for example by removing a drive from the array
>>>>> (mdadm --fail /dev/md0 /dev/sde1, mdadm --remove /dev/md0 /dev/sde1),
>>>>> the above-mentioned log messages appear in the kernel log and the bcache
>>>>> device /dev/bcache0 disappears shortly afterwards logging:
>>>>>
>>>>> bcache: bch_cached_dev_error() stop bcache0: too many IO errors on
>>>>> backing device md0
>>>>>
>>>>> to the kernel log.
>>>>>
>>>>> Increasing /sys/block/bcache0/bcache/io_error_limit to a very high value
>>>>> (1073741824) the bcache device /dev/bcache0 remains usable without any
>>>>> noticeable filesystem corruptions.
>>>>
>>>> If the backing device goes into read-only mode, bcache will take this
>>>> backing device as a failure status. The behavior is to stop the bcache
>>>> device of the failed backing device, to notify upper layer something
>>>> goes wrong.
>>>>
>>>> In writethough and writeback mode, bcache requires the backing device to
>>>> be writable.
>>>
>>> But, the degraded (one disk of the array missing) RAID6 device is still
>>> writable.
>>>
>>> Also after raising the io_error_limit of the bcache device to a very
>>> high value (1073741824 in my tests) I can use the bcache device on the
>>> degraded RAID6 array for hours reading and writing gigabytes of data,
>>> without getting any I/O errors or observing any filesystem corruptions.
>>> I'm just getting a lot of those
>>>
>>> bcache: bch_count_backing_io_errors() md0: IO error on backing device,
>>> unrecoverable
>>>
>>> messages in the kernel log.
>>>
>>> It seems that I/O requests for data that have been successfully
>>> recovered by the RAID6 from the redundant information stored on the
>>> additional disks are accidentally counted as failed I/O requests and
>>> when the configured io_error_limit for the bcache device is reached, the
>>> bcache device gets stopped.
>> Oh, thanks for the informaiton.
>>
>> It sounds during md raid6 degrading and recovering, some I/O from bcache
>> might be failed, and after md raid6 degrades and recovers, the md device
>> continue to serve I/O request. Am I right ?
>>
> 
> I think, the I/O errors logged by bcache are not real I/O errors,
> because the filesystem on top of the bcache device does not report any
> I/O errors unless the bcache device gets stopped by bcache due to too
> many errors (io_error_limit reached).
> 
> I performed the following test:
> 
> Starting with bcache on a healthy RAID6 with 4 disks (all attached and
> completely synced). cache_mode set to "none" to ensure data is read from
> the backing device. EXT4 filesystem on top of bcache mounted with two
> identical directories each containing 4GB of data on a system with 2GB
> of RAM to ensure data is not coming form the page cache. "diff -r dir1
> dir2" running in a loop to check for inconsistencies. Also
> io_error_limit has been raised to 1073741824 to ensure the bcache device
> does not get stopped due to too many io errors during the test.
> 
> As long as all 4 disks attached to the RAID6 array, no messages get logged.
> 
> Once one disk is removed from the RAID6 array using
>   mdadm --fail /dev/md0 /dev/sde1
> the kernel log gets filled with the
> 
> bcache: bch_count_backing_io_errors() md0: IO error on backing device,
> unrecoverable
> 
> messages. However neither the EXT4 filesystem logs any corruptions nor
> does the diff comparing the two directories report any inconsistencies.
> 
> Adding the previously removed disk back to the RAID6 array, bcache stops
> reporting the above-mentioned error message once the re-added disk is
> fully synced and the RAID6 array is healthy again.
> 
> If the I/O requests to the RAID6 device would actually fail, I would
> expect to see either EXT4 filesystem errors in the logs or at least diff
> reporting differences, but nothing gets logged in the kernel log expect
> the above-mentioned message from bcache.
> 
> It seems bcache mistakenly classifies or at least counts some I/O
> requests as failed although they have not actually failed.
> 
> By the way Linux 4.9 (from Debian stable) is most probably not affected.
Hi Thorsten,

Let me try to reproduce and check into. I will ask you for more
information later.

Very informative, thanks.

-- 

Coly Li