On 02/02/18 16:03, Wols Lists wrote: > On 02/02/18 14:50, David Brown wrote: >> What are these cases? We have already eliminated the rebuild situation >> I described. And in particular, which use-cases are you thinking of >> where you not be better off with alternative integrity improvements >> (like higher redundancy levels) without killing performance? >> > In particular, when you KNOW you've got a damaged raid, and you want to > know which files are affected. The whole point of my technique is that > either it uses the raid to recover (if it can) or it propagates a read > error back to the application. It does NOT "fix" the data and leave a > corrupted file behind. If you read a block and the read fails, the raid system will already read the whole stripe to re-create the missing data. If it can re-create it, it writes the new data back to the disk and returns it to the application. If it cannot, it gives the read error back to the application. I cannot imagine a situation where you would have a disk that you know has incorrect data, as part of your array and in normal use for a file system. For the situation I originally described, if there were no support for the bad block lists, then you would have to have a more complex procedure for the rebuild. (I believe it would be somethign like this. Enable the write intent bitmap for the raid5 level, take the raid1 pair with the missing drive out of the raid5, rebuild the raid1 pair, and if the build is successful then put it back in the raid5 and let the write intent logic bring it up to speed. If the build had errors, you'd have to unmount the filesystem, let the write intent logic finish writing, then scrub the raid5.) But since there is the bad block list to handle my concerns, there is no problem there. > >>>> >> That does not make sense. The bad block list described by Neil will do >> the job correctly. hdparm bad block marking could also work, but it >> does so at a lower level and the sector is /not/ corrected >> automatically, AFAIK. It also would not help if the raid1 were not >> directly on a hard disk (think disk partition, another raid, an LVM >> partition, an iSCSI disk, a remote block device, an encrypted block >> device, etc.). >> > Nor does the bad block list correct the error automatically, if that's > true then. The bad blocks list fakes a read error, the hdparm causes a > real read error. When the raid-5 scrub hits, either version triggers a > rewrite. > > Thing about the bad-block list, is that that disk block is NOT > rewritten. It's moved, and that disk space is LOST. With hdparm, that > block gets rewritten, and if the rewrite succeeds the space is recovered. I don't know what the details are for when blocks are removed from bad lists (either the md raid bad block list, or the hdparm list) and re-tried. But it does not matter - the fraction of wasted space is negligible. > > Cheers, > Wol > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html