Unrecovered read error issue

Viacheslav Dubeyko <slava@xxxxxxxxxxx> · Fri, 18 Dec 2015 17:26:12 -0800

Hi Ryusuke,

Recently, Brian Cottingham <spiffytech@xxxxxxxxx> reported about issue
with GC of NILFS2. He shared environment and issue details:

Linux spiffyhome 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1+deb8u6
(2015-11-09) x86_64 GNU/Linux
nilfs-tools 2.2.1-1

This partition is used for bulk media storage and to hold backups from
my other devices. Pretty low-use; it mostly just sits there waiting
for new data.

The drive is an HDD, purchased 2014-09-04:
http://smile.amazon.com/dp/B00EHBEUZO/ref=pe_385040_121528360_TE_dp_5?sa-no-redirect=1

Model: ATA WDC WD40EZRX-00S (scsi)
Disk /dev/sdb: 4001GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name  Flags
 1      1049kB  4001GB  4001GB  nilfs2

Disk /dev/sdb: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes

Dec 17 16:02:13 spiffyhome kernel: [175681.852060] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Dec 17 16:02:13 spiffyhome kernel: [175681.852066] ata2.00: BMDMA stat 0x25
Dec 17 16:02:13 spiffyhome kernel: [175681.852070] ata2.00: failed command: READ DMA EXT
Dec 17 16:02:13 spiffyhome kernel: [175681.852077] ata2.00: cmd 25/00:00:40:b0:fc/00:04:5a:00:00/e0 tag 0 dma 524288 in
Dec 17 16:02:13 spiffyhome kernel: [175681.852077]          res 51/40:4f:f0:b2:fc/40:01:5a:00:00/e0 Emask 0x9 (media error)
Dec 17 16:02:13 spiffyhome kernel: [175681.852081] ata2.00: status: { DRDY ERR }
Dec 17 16:02:13 spiffyhome kernel: [175681.852083] ata2.00: error: { UNC }
Dec 17 16:02:14 spiffyhome kernel: [175681.880266] ata2.00: configured for UDMA/133
Dec 17 16:02:14 spiffyhome kernel: [175681.880680] sd 1:0:0:0: [sdb] Unhandled sense code
Dec 17 16:02:14 spiffyhome kernel: [175681.880683] sd 1:0:0:0: [sdb]
Dec 17 16:02:14 spiffyhome kernel: [175681.880685] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 17 16:02:14 spiffyhome kernel: [175681.880688] sd 1:0:0:0: [sdb]
Dec 17 16:02:14 spiffyhome kernel: [175681.880689] Sense Key : Medium Error [current] [descriptor]
Dec 17 16:02:14 spiffyhome kernel: [175681.880692] Descriptor sense data with sense descriptors (in hex):
Dec 17 16:02:14 spiffyhome kernel: [175681.880694]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Dec 17 16:02:14 spiffyhome kernel: [175681.880701]         5a fc b2 f0
Dec 17 16:02:14 spiffyhome kernel: [175681.880705] sd 1:0:0:0: [sdb]
Dec 17 16:02:14 spiffyhome kernel: [175681.880707] Add. Sense: Unrecovered read error - auto reallocate failed
Dec 17 16:02:14 spiffyhome kernel: [175681.880709] sd 1:0:0:0: [sdb] CDB:
Dec 17 16:02:14 spiffyhome kernel: [175681.880711] Read(16): 88 00 00 00 00 00 5a fc b0 40 00 00 04 00 00 00
Dec 17 16:02:14 spiffyhome kernel: [175681.880720] end_request: I/O error, dev sdb, sector 1526510320
Dec 17 16:02:14 spiffyhome kernel: [175681.880756] ata2: EH complete
Dec 17 16:02:14 spiffyhome kernel: [175681.880916] NILFS: GC failed during preparation: cannot read source blocks: err=-5

So, it's possible to see that the reason of issue is unrecoverable read
error on HDD side. But the bad thing here that GC stops on every start
because it encounters I/O error again and again. Finally, aged segments
don't reclaim at all. And, as result, free space of a volume is
exhausted.

>From one point of view, GC behavior is correct. GC encounters I/O error
because of external reasons and it stops. But such GC behavior is
completely wrong from end user's point of view. Because bad sector is
not critical issue for stopping GC and file system operations. So, the
ideal solution could be some erasure coding scheme implementation. But
even erasure coding scheme is unable to guarantee complete resolving of
such potential issue. Moreover, opportunity to encounter some error on
drive side is much higher for modern HDD with huge capacity (several
TBs) or modern SSDs. So, it makes sense to implement simple solution for
processing likewise issues on GC side. One of the possible solution
could be to return zeroed block for moving with informing end-user about
such issue in syslog. Another way could be to inform user about such
issue and to provide some user-space tool for recovering volume state.
But again recovering will be simply moving zeroed block.

So, what do you think about such issue? What possible and easy solution
do you see? We haven't opportunity for long-term implementation and we
need in some easy hack for it. What do you think?

Thanks,
Vyacheslav Dubeyko.

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html