Re: [PATCH 0/6] block: add support for REQ_OP_VERIFY

"hch@xxxxxx" <hch@xxxxxx> · Mon, 12 Dec 2022 07:30:17 +0100

On Sat, Dec 10, 2022 at 10:06:34AM -0300, Carlos Carvalho wrote:
> Certainly we have. Currently admins have to periodically run full block range
> checks in redundant arrays to detect bad blocks and correct them while
> redundancy is available. Otherwise when a disk fails and you try to reconstruct
> the replacement you hit another block in the remaining disks that's bad and you
> cannot complete the reconstruction and have data loss. These checks are a
> burden because they have HIGH overhead, significantly reducing bandwidth for
> the normal use of the array.
> 
> If there was a standard interface for getting the list of bad blocks that the
> firmware secretly knows the kernel could implement the repair continuosly, with
> logs etc. That'd really be a relief for admins and, specially, users.

Both SCSI and NVMe can do this through the GET LBA STATUS command -
in SCSI this was a later addition abusing the command, and in NVMe
only the abuse survived.  NVMe also has a log page an AEN associated
for it, I'd have to spend more time reading SBC to remember if SCSI
also has a notification mechanism of some sort.