On 2019/3/27 9:42 下午, Thorsten Knabe wrote: > On 3/27/19 12:53 PM, Coly Li wrote: >> On 2019/3/27 7:00 下午, Thorsten Knabe wrote: >>> On 3/27/19 10:44 AM, Coly Li wrote: >>>> On 2019/3/26 9:21 下午, Thorsten Knabe wrote: >>>>> Hello, >>>>> >>>>> there seems to be a serious problem, when running bcache on top of a >>>>> degraded RAID-6 (MD) array. The bcache device /dev/bcache0 disappears >>>>> after a few I/O operations on the affected device and the kernel log >>>>> gets filled with the following log message: >>>>> >>>>> bcache: bch_count_backing_io_errors() md0: IO error on backing device, >>>>> unrecoverable >>>>> >>>> >>>> It seems I/O request onto backing device failed. If the md raid6 device >>>> is the backing device, does it go into read-only mode after degrade ? >>> >>> No, the RAID6 backing device is still in read-write mode after the disk >>> has been removed from the RAID array. That's the way RAID6 is supposed >>> to work. >>> >>>> >>>> >>>>> Setup: >>>>> Linux kernel: 5.1-rc2, 5.0.4, 4.19.0-0.bpo.2-amd64 (Debian backports) >>>>> all affected >>>>> bcache backing device: EXT4 filesystem -> /dev/bcache0 -> /dev/md0 -> >>>>> /dev/sd[bcde]1 >>>>> bcache cache device: /dev/sdf1 >>>>> cache mode: writethrough, none and cache device detached are all >>>>> affected, writeback and writearound has not been tested >>>>> KVM for testing, first observed on real hardware (failing RAID device) >>>>> >>>>> As long as the RAID6 is healthy, bcache works as expected. Once the >>>>> RAID6 gets degraded, for example by removing a drive from the array >>>>> (mdadm --fail /dev/md0 /dev/sde1, mdadm --remove /dev/md0 /dev/sde1), >>>>> the above-mentioned log messages appear in the kernel log and the bcache >>>>> device /dev/bcache0 disappears shortly afterwards logging: >>>>> >>>>> bcache: bch_cached_dev_error() stop bcache0: too many IO errors on >>>>> backing device md0 >>>>> >>>>> to the kernel log. >>>>> >>>>> Increasing /sys/block/bcache0/bcache/io_error_limit to a very high value >>>>> (1073741824) the bcache device /dev/bcache0 remains usable without any >>>>> noticeable filesystem corruptions. >>>> >>>> If the backing device goes into read-only mode, bcache will take this >>>> backing device as a failure status. The behavior is to stop the bcache >>>> device of the failed backing device, to notify upper layer something >>>> goes wrong. >>>> >>>> In writethough and writeback mode, bcache requires the backing device to >>>> be writable. >>> >>> But, the degraded (one disk of the array missing) RAID6 device is still >>> writable. >>> >>> Also after raising the io_error_limit of the bcache device to a very >>> high value (1073741824 in my tests) I can use the bcache device on the >>> degraded RAID6 array for hours reading and writing gigabytes of data, >>> without getting any I/O errors or observing any filesystem corruptions. >>> I'm just getting a lot of those >>> >>> bcache: bch_count_backing_io_errors() md0: IO error on backing device, >>> unrecoverable >>> >>> messages in the kernel log. >>> >>> It seems that I/O requests for data that have been successfully >>> recovered by the RAID6 from the redundant information stored on the >>> additional disks are accidentally counted as failed I/O requests and >>> when the configured io_error_limit for the bcache device is reached, the >>> bcache device gets stopped. >> Oh, thanks for the informaiton. >> >> It sounds during md raid6 degrading and recovering, some I/O from bcache >> might be failed, and after md raid6 degrades and recovers, the md device >> continue to serve I/O request. Am I right ? >> > > I think, the I/O errors logged by bcache are not real I/O errors, > because the filesystem on top of the bcache device does not report any > I/O errors unless the bcache device gets stopped by bcache due to too > many errors (io_error_limit reached). > > I performed the following test: > > Starting with bcache on a healthy RAID6 with 4 disks (all attached and > completely synced). cache_mode set to "none" to ensure data is read from > the backing device. EXT4 filesystem on top of bcache mounted with two > identical directories each containing 4GB of data on a system with 2GB > of RAM to ensure data is not coming form the page cache. "diff -r dir1 > dir2" running in a loop to check for inconsistencies. Also > io_error_limit has been raised to 1073741824 to ensure the bcache device > does not get stopped due to too many io errors during the test. > > As long as all 4 disks attached to the RAID6 array, no messages get logged. > > Once one disk is removed from the RAID6 array using > mdadm --fail /dev/md0 /dev/sde1 > the kernel log gets filled with the > > bcache: bch_count_backing_io_errors() md0: IO error on backing device, > unrecoverable > > messages. However neither the EXT4 filesystem logs any corruptions nor > does the diff comparing the two directories report any inconsistencies. > > Adding the previously removed disk back to the RAID6 array, bcache stops > reporting the above-mentioned error message once the re-added disk is > fully synced and the RAID6 array is healthy again. > > If the I/O requests to the RAID6 device would actually fail, I would > expect to see either EXT4 filesystem errors in the logs or at least diff > reporting differences, but nothing gets logged in the kernel log expect > the above-mentioned message from bcache. > > It seems bcache mistakenly classifies or at least counts some I/O > requests as failed although they have not actually failed. > > By the way Linux 4.9 (from Debian stable) is most probably not affected. Hi Thorsten, Let me try to reproduce and check into. I will ask you for more information later. Very informative, thanks. -- Coly Li