Device IO error question

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, Dear Mr

with linux 4.17.11, I built a bcache environment, one nvme as a cache device, and six sata as a device. It is found that if one sata io exception causes the cache to be unavailable, then the other five sata is also kicked off. The following is the dmesg log.
sd 0:2:7:0: [sdi] tag#2 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 0:2:7:0: [sdi] tag#29 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 0:2:7:0: [sdi] tag#2 CDB: Read(16) 88 00 00 00 00 00 c2 82 fa b8 00 00 00 40 00 00
sd 0:2:7:0: [sdi] tag#29 CDB: Write(16) 8a 00 00 00 00 02 36 4b dc 70 00 00 02 00 00 00
print_req_error: I/O error, dev sdi, sector 3263363768
print_req_error: I/O error, dev sdi, sector 9500875888
sd 0:2:7:0: [sdi] tag#17 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 0:2:7:0: [sdi] tag#17 CDB: Read(16) 88 00 00 00 00 01 9b 48 68 00 00 00 02 00 00 00
print_req_error: I/O error, dev sdi, sector 6900180992
print_req_error: I/O error, dev sdi, sector 6969256320
sd 0:2:7:0: [sdi] tag#4 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 0:2:7:0: [sdi] tag#1 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 0:2:7:0: [sdi] tag#6 CDB: Write(16) 8a 00 00 00 00 02 36 4b d6 70 00 00 02 00 00 00
print_req_error: I/O error, dev sdi, sector 7108554960
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
print_req_error: I/O error, dev sdi, sector 9500876400
sd 0:2:7:0: [sdi] tag#4 CDB: Read(16) 88 00 00 00 00 00 01 94 3d e0 00 00 01 c0 00 00
print_req_error: I/O error, dev sdi, sector 9500874352
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
sd 0:2:7:0: [sdi] tag#8 CDB: Read(16) 88 00 00 00 00 01 c7 a3 3f b0 00 00 02 00 00 00
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
sd 0:2:7:0: [sdi] tag#1 CDB: Read(16) 88 00 00 00 00 00 c2 82 f8 b8 00 00 02 00 00 00
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
megaraid_sas 0000:02:00.0: scanning for scsi0...
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
megaraid_sas 0000:02:00.0: 1508 (595830949s/0x0001/FATAL) - VD 07/7 is now OFFLINE
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_count_backing_io_errors() sdi: IO error on backing device, unrecoverable
bcache: bch_cached_dev_error() stop bcache7: too many IO errors on backing device sdi
bcache: bch_count_io_errors() nvme1n1: IO error on writing data to cache.
bcache: bch_count_io_errors() nvme1n1: IO error on writing data to cache.
bcache: bch_count_io_errors() nvme1n1: IO error on writing data to cache.
bcache: bch_count_io_errors() nvme1n1: IO error on writing data to cache.
bcache: bch_count_io_errors() nvme1n1: IO error on writing data to cache.
bcache: bch_count_io_errors() nvme1n1: IO error on writing data to cache.
bcache: bch_count_io_errors() nvme1n1: IO error on writing data to cache.
bcache: bch_cache_set_error() CACHE_SET_IO_DISABLE already set
bcache: error on b1dd28cb-10ec-4915-9b48-88deb6a7f61b:
nvme1n1: too many IO errors writing data to cache
, disabling caching
bcache: conditional_stop_bcache_device() stop_when_cache_set_failed of bcache6 is "auto" and cache is dirty, stop it to avoid potential data corruption.
bcache: conditional_stop_bcache_device() stop_when_cache_set_failed of bcache7 is "auto" and cache is dirty, stop it to avoid potential data corruption.
bcache: conditional_stop_bcache_device() stop_when_cache_set_failed of bcache8 is "auto" and cache is dirty, stop it to avoid potential data corruption.
bcache: conditional_stop_bcache_device() stop_when_cache_set_failed of bcache9 is "auto" and cache is dirty, stop it to avoid potential data corruption.
bcache: conditional_stop_bcache_device() stop_when_cache_set_failed of bcache10 is "auto" and cache is dirty, stop it to avoid potential data corruption.
bcache: conditional_stop_bcache_device() stop_when_cache_set_failed of bcache11 is "auto" and cache is dirty, stop it to avoid potential data corruption.
bcache: cached_dev_detach_finish() Caching disabled for sdk
bcache: cached_dev_detach_finish() Caching disabled for sdh
bcache: cached_dev_detach_finish() Caching disabled for sdl
bcache: cached_dev_detach_finish() Caching disabled for sdj
bcache: cached_dev_detach_finish() Caching disabled for sdm
sd 0:2:7:0: SCSI device is removed
megaraid_sas 0000:02:00.0: 1512 (595831198s/0x0004/CRIT) - Enclosure PD 20(c None/p1) phy bad for slot 7
bcache: bcache_device_free() bcache11 stopped
bcache: bcache_device_free() bcache10 stopped
bcache: bcache_device_free() bcache9 stopped
bcache: bcache_device_free() bcache8 stopped
bcache: bcache_device_free() bcache7 stopped
bcache: bcache_device_free() bcache6 stopped
bcache: cache_set_free() Cache set b1dd28cb-10ec-4915-9b48-88deb6a7f61b unregistered

I looked at the bcache code and found that in the bch_cached_dev_error function, the flag of cache_set is set to CACHE_SET_IO_DISABLE, but the code to clear this flag is not found, and  in the closure_bio_submit function. If the flags of cache_set  is CACHE_SET_IO_DISABLE, will trigger cache Io error.

Is the design of bcache like this, or is my use wrong?

I look forward to your reply. Thank you.

--Nina




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux