On Wed, 26 Feb 2014 19:16:30 +1100 Eyal Lebedinsky <eyal@xxxxxxxxxxxxxx> wrote: > In another thread I investigated an issue with a pending sector, which now seems to be > a bad sector inside the md header (the first 256k sectors). > > The question now remaining: what is the correct approach to fixing this problem? You could "fix" it by simply redefining it not to be a problem. If you never get an error then is there a problem? > > The more general issue is what to do when any md control area develops an error. does > all data have redundant copies? We don't currently have any redundancy with a device. Of course most metadata is replicated across all devices so there is redundancy in that sense. I have occasionally thought of creating a v1.3 metadata which duplicates the superblock at both end of the device. Never quite seemed worth the effort though. The write-intent-bitmap would be a lot more expensive to duplicate but as it is identical on all devices, the gain would be small (though there are cases where it would be useful). The bad-block log probably should be duplicated. That wouldn't be too expensive and might have some real benefits.... > > The simple way that I see is to fail the member, remove it, clear it (at least > --zero-superblock and write to the bad sector) and then add it. However this > will incur a full resync (about 10 hours). > > Is there a faster, yet safe way? I was thinking that a clean umount and raid stop > should allow a create with --assume-clean (which will write to the bad sector and > "fix" it), but the doco discourages this. Why do you think this will write the bad sector? When you --create and array it doesn't write too all the space on the array. It only writes what it needs to. So the superblock, the write-intent-bitmap and maybe the bad-block-log. But nothing else. And most of that gets written during normal array activity. So if a block remains unwritten after stop/start/check, you can be fairy sure it isn't used at all, so you can ignore it. Or write zeros to it. > > Also, it is not impossible to think that the specific bad sector (toward the end > of the header) is not actually used today, meaning I can live with it as is, or > write anything to the bad sector as it does not get used. Too involved though. > > A bad sector in the data area should be fixed with a standard raid 'check' action. > > TIA > NeilBrown
Attachment:
signature.asc
Description: PGP signature