Re: What to do about Offline_Uncorrectable and Pending_Sector in RAID1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Quick first response ...

On 13/11/16 18:46, Bruce Merry wrote:
Hi

I'm running software RAID1 across two drives in my home machine (LVM
on LUKS on RAID1). I've just installed smartmontools and run short
tests, and promptly received emails to tell me that one of the drives
has 4 offline uncorrectable sectors and 3 current pending sectors.
I've attached smartctl --xall output for sda (good) and sdb (bad).

These drives are pretty old (over 5 years) so I'm going to replace
them as soon as I have time (and yes, I have backups), but in the
meantime I'd like advice on:

What drives are they? I'm guessing they're hunky-dory, but they don't fall foul of timeout mismatch, do they?

https://raid.wiki.kernel.org/index.php/Timeout_Mismatch

1. What exactly this means. My understanding is that some data has
been lost (or may have been lost) on the drive, but the drive still
has spare sectors to remap things once the failed sectors are written
to. Is that correct?

It may also mean that the four sectors at least, have already been remapped ... I'll let the experts confirm. The three pending errors might be where a read has failed but there's not yet been a re-write - and you won't have noticed because the raid dealt with it.

2. How can I tell which sectors are problematic? If it's in the swap
partition I'm far less worried. I can see two LBAs for offline
uncorrectable errors in the --xall output, but that still leaves
another two at large.

I don't think you need to be worried at all. It's only a few sectors, there's no sign of any further trouble? and as it's raided, when the drive returns an error the raid code will sort it out for you.

3. Assuming my understanding is correct, and that the sector falls
within the RAID1 partition on the drive, is there some way I can
recover the sectors from the other drive in the RAID1? As a last
resort I imagine I could wipe the suspect drive and then rebuild it
from the good one, but I'm hoping there's something less risky I can
do.

Do a scrub? You've got seven errors total, which some people will say "panic on the first error" and others will say "so what, the odd error every now and then is nothing to worry about". The point of a scrub is it will background-scan the entire array, and if it can't read anything, it will re-calculate and re-write it.

Just make sure you've not got that timeout problem, or a scrub will make matters a whole lot worse ...

Thanks in advance
Bruce

Cheers,
Wol
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux