On Sat, Dec 10, 2022 at 10:06:34AM -0300, Carlos Carvalho wrote: > Certainly we have. Currently admins have to periodically run full block range > checks in redundant arrays to detect bad blocks and correct them while > redundancy is available. Otherwise when a disk fails and you try to reconstruct > the replacement you hit another block in the remaining disks that's bad and you > cannot complete the reconstruction and have data loss. These checks are a > burden because they have HIGH overhead, significantly reducing bandwidth for > the normal use of the array. > > If there was a standard interface for getting the list of bad blocks that the > firmware secretly knows the kernel could implement the repair continuosly, with > logs etc. That'd really be a relief for admins and, specially, users. Both SCSI and NVMe can do this through the GET LBA STATUS command - in SCSI this was a later addition abusing the command, and in NVMe only the abuse survived. NVMe also has a log page an AEN associated for it, I'd have to spend more time reading SBC to remember if SCSI also has a notification mechanism of some sort.