Re: how to handle bad sectors in md control areas?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 26 Feb 2014 19:16:30 +1100 Eyal Lebedinsky <eyal@xxxxxxxxxxxxxx>
wrote:

> In another thread I investigated an issue with a pending sector, which now seems to be
> a bad sector inside the md header (the first 256k sectors).
> 
> The question now remaining: what is the correct approach to fixing this problem?

You could "fix" it by simply redefining it not to be a problem.
If you never get an error then is there a problem?

> 
> The more general issue is what to do when any md control area develops an error. does
> all data have redundant copies?

We don't currently have any redundancy with a device.  Of course most
metadata is replicated across all devices so there is redundancy in that
sense.
I have occasionally thought of creating a v1.3 metadata which duplicates the
superblock at both end of the device.  Never quite seemed worth the effort
though.
The write-intent-bitmap would be a lot more expensive to duplicate but as it
is identical on all devices, the  gain would be small (though there are cases
where it would be useful).

The bad-block log probably should be duplicated.  That wouldn't be too
expensive and  might have  some real benefits....

> 
> The simple way that I see is to fail the member, remove it, clear it (at least
> --zero-superblock and write to the bad sector) and then add it. However this
> will incur a full resync (about 10 hours).
> 
> Is there a faster, yet safe way? I was thinking that a clean umount and raid stop
> should allow a create with --assume-clean (which will write to the bad sector and
> "fix" it), but the doco discourages this.

Why do you think this will write the bad sector?
When you --create and array it doesn't write too all the space on the array.
It only writes what it needs to.  So the superblock, the write-intent-bitmap
and maybe the bad-block-log.  But nothing else.
And most of that gets written during normal array activity.

So if a block remains unwritten after stop/start/check, you can be fairy sure
it isn't used at all, so you can ignore it.  Or write zeros to it.

> 
> Also, it is not impossible to think that the specific bad sector (toward the end
> of the header) is not actually used today, meaning I can live with it as is, or
> write anything to the bad sector as it does not get used. Too involved though.
> 
> A bad sector in the data area should be fixed with a standard raid 'check' action.
> 
> TIA
> 

NeilBrown

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux