Re: Impending failure?

Alex <mysqlstudent@xxxxxxxxx> · Mon, 7 Nov 2011 16:17:54 -0500

Hi guys,

> So, in my personal experience with pending sectors, it's worth mentioning the following:
>
> If you do a "check", and you have any pending sectors that are within the partition that is used for the md device, they should be read and rewritten as needed, causing the count to go down. However, I've noticed that sometimes I have pending sector counts on drives that don't go away after a "check". These would go away, however, if I failed and then removed the drive with mdadm, and then subsequently zero filled the /entire/ drive (as opposed to just the partition on that disk that is used by the array). The reason for this is that there's a small chunk of unused space that never gets read or written to right after the partition (even though I technically partition the entire drive as one large partition (fd  Linux raid auto).
>
> I think what actually happens in this case is that when the system reads data from near the end of the array, the drive itself will do read-ahead and cache it. So, even though the computer never requested those abandoned sectors, the drive eventually notices that it can't read them, and makes a note of the fact. So, this is harmless.
>
> You could probably avoid the potential for false-positive on pending sectors if you used the entire disk for the array (no partitions), but I'm pretty sure that breaks the raid auto-detection.
>
> Currently, my main array has 8 2TB hitachi disks, in a raid 6. It is scrubbed once a week, and one disk consistently has 8 pending sectors on it. I'm certain I could make those go away if I wanted, but, frankly, it's purely aesthetic as far as I'm concerned. Some of my drives also have non-zero "196 Reallocated_Event_Count" and "5 Reallocated_Sector_Ct", however, I have no drives with non-zero "Offline_Uncorrectable". I haven't had any problems with the disks or array (other than a temperature induced failure ... but that's another story, and I still run the same disks after that event). I used to have lots of issues before I started scrubbing consistently.

I think I understand your explanation. You are basically saying that
if I recheck the drive, there's a possibility the pending defective
sectors may resolve themselves?

Given that I have an existing system, how do I check the integrity of
the partitions? What is the contents of the "check" script to which
you refer?

Is this safe to do remotely?

Is it necessary to set a disk faulty before removing it?

Thanks,
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html