Martin K. Petersen (martin.petersen@xxxxxxxxxx) wrote on Fri, Dec 09, 2022 at 01:52:01AM -03: > I suspect that these days it is very hard to find a storage device that > doesn't do media management internally in the background. So from the > perspective of physically exercising the media, VERIFY is probably not > terribly useful anymore. > > In that light, having to run VERIFY over the full block range of a > device to identify unreadable blocks seems like a fairly clunky > mechanism. Querying the device for a list of unrecoverable blocks > already identified by the firmware seems like a better interface. Sure. > But I think device validation is a secondary issue. The more > pertinent question is whether we have use cases in the kernel (MD, > btrfs) which would benefit from being able to preemptively identify > unreadable blocks? Certainly we have. Currently admins have to periodically run full block range checks in redundant arrays to detect bad blocks and correct them while redundancy is available. Otherwise when a disk fails and you try to reconstruct the replacement you hit another block in the remaining disks that's bad and you cannot complete the reconstruction and have data loss. These checks are a burden because they have HIGH overhead, significantly reducing bandwidth for the normal use of the array. If there was a standard interface for getting the list of bad blocks that the firmware secretly knows the kernel could implement the repair continuosly, with logs etc. That'd really be a relief for admins and, specially, users.