Re: [PATCH 0/6] block: add support for REQ_OP_VERIFY

Carlos Carvalho <carlos@xxxxxxxxxxxxxx> · Sat, 10 Dec 2022 10:06:34 -0300

Martin K. Petersen (martin.petersen@xxxxxxxxxx) wrote on Fri, Dec 09, 2022 at 01:52:01AM -03:
> I suspect that these days it is very hard to find a storage device that
> doesn't do media management internally in the background. So from the
> perspective of physically exercising the media, VERIFY is probably not
> terribly useful anymore.
> 
> In that light, having to run VERIFY over the full block range of a
> device to identify unreadable blocks seems like a fairly clunky
> mechanism. Querying the device for a list of unrecoverable blocks
> already identified by the firmware seems like a better interface.

Sure.

> But I think device validation is a secondary issue. The more
> pertinent question is whether we have use cases in the kernel (MD,
> btrfs) which would benefit from being able to preemptively identify
> unreadable blocks?

Certainly we have. Currently admins have to periodically run full block range
checks in redundant arrays to detect bad blocks and correct them while
redundancy is available. Otherwise when a disk fails and you try to reconstruct
the replacement you hit another block in the remaining disks that's bad and you
cannot complete the reconstruction and have data loss. These checks are a
burden because they have HIGH overhead, significantly reducing bandwidth for
the normal use of the array.

If there was a standard interface for getting the list of bad blocks that the
firmware secretly knows the kernel could implement the repair continuosly, with
logs etc. That'd really be a relief for admins and, specially, users.