Re: Device IO error question

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2019/6/5 8:16 下午, nina wrote:
> Hi, Dear Mr
> 
> with linux 4.17.11, I built a bcache environment, one nvme as a cache device, and six sata as a device. It is found that if one sata io exception causes the cache to be unavailable, then the other five sata is also kicked off. The following is the dmesg log.
> sd 0:2:7:0: [sdi] tag#2 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 0:2:7:0: [sdi] tag#29 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 0:2:7:0: [sdi] tag#2 CDB: Read(16) 88 00 00 00 00 00 c2 82 fa b8 00 00 00 40 00 00
> sd 0:2:7:0: [sdi] tag#29 CDB: Write(16) 8a 00 00 00 00 02 36 4b dc 70 00 00 02 00 00 00
> print_req_error: I/O error, dev sdi, sector 3263363768
> print_req_error: I/O error, dev sdi, sector 9500875888
> sd 0:2:7:0: [sdi] tag#17 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 0:2:7:0: [sdi] tag#17 CDB: Read(16) 88 00 00 00 00 01 9b 48 68 00 00 00 02 00 00 00
> print_req_error: I/O error, dev sdi, sector 6900180992
> print_req_error: I/O error, dev sdi, sector 6969256320
> sd 0:2:7:0: [sdi] tag#4 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 0:2:7:0: [sdi] tag#1 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 0:2:7:0: [sdi] tag#6 CDB: Write(16) 8a 00 00 00 00 02 36 4b d6 70 00 00 02 00 00 00
> print_req_error: I/O error, dev sdi, sector 7108554960
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> print_req_error: I/O error, dev sdi, sector 9500876400
> sd 0:2:7:0: [sdi] tag#4 CDB: Read(16) 88 00 00 00 00 00 01 94 3d e0 00 00 01 c0 00 00
> print_req_error: I/O error, dev sdi, sector 9500874352
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> sd 0:2:7:0: [sdi] tag#8 CDB: Read(16) 88 00 00 00 00 01 c7 a3 3f b0 00 00 02 00 00 00
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> sd 0:2:7:0: [sdi] tag#1 CDB: Read(16) 88 00 00 00 00 00 c2 82 f8 b8 00 00 02 00 00 00
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> megaraid_sas 0000:02:00.0: scanning for scsi0...
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> megaraid_sas 0000:02:00.0: 1508 (595830949s/0x0001/FATAL) - VD 07/7 is now OFFLINE
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
> bcache: bch_cached_dev_error() stop bcache7: too many IO errors on backing device sdi
> bcache: bch_count_io_errors() nvme1n1: IO error on writing data to cache.
> bcache: bch_count_io_errors() nvme1n1: IO error on writing data to cache.
> bcache: bch_count_io_errors() nvme1n1: IO error on writing data to cache.
> bcache: bch_count_io_errors() nvme1n1: IO error on writing data to cache.
> bcache: bch_count_io_errors() nvme1n1: IO error on writing data to cache.
> bcache: bch_count_io_errors() nvme1n1: IO error on writing data to cache.
> bcache: bch_count_io_errors() nvme1n1: IO error on writing data to cache.
> bcache: bch_cache_set_error() CACHE_SET_IO_DISABLE already set
> bcache: error on b1dd28cb-10ec-4915-9b48-88deb6a7f61b:
> nvme1n1: too many IO errors writing data to cache
> , disabling caching
> bcache: conditional_stop_bcache_device() stop_when_cache_set_failed of bcache6 is "auto" and cache is dirty, stop it to avoid potential data corruption.
> bcache: conditional_stop_bcache_device() stop_when_cache_set_failed of bcache7 is "auto" and cache is dirty, stop it to avoid potential data corruption.
> bcache: conditional_stop_bcache_device() stop_when_cache_set_failed of bcache8 is "auto" and cache is dirty, stop it to avoid potential data corruption.
> bcache: conditional_stop_bcache_device() stop_when_cache_set_failed of bcache9 is "auto" and cache is dirty, stop it to avoid potential data corruption.
> bcache: conditional_stop_bcache_device() stop_when_cache_set_failed of bcache10 is "auto" and cache is dirty, stop it to avoid potential data corruption.
> bcache: conditional_stop_bcache_device() stop_when_cache_set_failed of bcache11 is "auto" and cache is dirty, stop it to avoid potential data corruption.
> bcache: cached_dev_detach_finish() Caching disabled for sdk
> bcache: cached_dev_detach_finish() Caching disabled for sdh
> bcache: cached_dev_detach_finish() Caching disabled for sdl
> bcache: cached_dev_detach_finish() Caching disabled for sdj
> bcache: cached_dev_detach_finish() Caching disabled for sdm
> sd 0:2:7:0: SCSI device is removed
> megaraid_sas 0000:02:00.0: 1512 (595831198s/0x0004/CRIT) - Enclosure PD 20(c None/p1) phy bad for slot 7
> bcache: bcache_device_free() bcache11 stopped
> bcache: bcache_device_free() bcache10 stopped
> bcache: bcache_device_free() bcache9 stopped
> bcache: bcache_device_free() bcache8 stopped
> bcache: bcache_device_free() bcache7 stopped
> bcache: bcache_device_free() bcache6 stopped
> bcache: cache_set_free() Cache set b1dd28cb-10ec-4915-9b48-88deb6a7f61b unregistered
> 
> I looked at the bcache code and found that in the bch_cached_dev_error function, the flag of cache_set is set to CACHE_SET_IO_DISABLE, but the code to clear this flag is not found, and  in the closure_bio_submit function. If the flags of cache_set  is CACHE_SET_IO_DISABLE, will trigger cache Io error.
> 
> Is the design of bcache like this, or is my use wrong?
> 
> I look forward to your reply. Thank you.

This problem is caused by commit 6147305c73e4 ("bcache: set
CACHE_SET_IO_DISABLE in bch_cached_dev_error()")

I will post a revert patch to revert this commit in 5.3 and CC
stable@xxxxxxxxxxxxxxx

Thanks.

Coly Li


-- 

Coly Li



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux