Re: How to fix Current_Pending_Sector?

CoolCold <coolthecold@xxxxxxxxx> · Thu, 18 Mar 2010 20:35:55 +0300

I had similar issue - there were 5 Currently unreadable (pending)
sectors, 1 Offline uncorrectable sectors then drive was kicked out of
the raid, but readding drive helped - that bad sector gone. Now there
2 pending, 1 uncorrectable, so i gonna fix that two.
My question is - are there any ways to resync array faster? Say if
I'll update bitmaps from current 0.9, fail drive, do dd on sectors,
add drive, will bitmap help to resync not the whole drive, but just
parts which have changed?

On Mon, Mar 15, 2010 at 2:20 PM, Iain Rauch
<groups@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>> Am 11.03.2010 13:25, schrieb Iain Rauch:
>>>> On Thu, Mar 11, 2010 at 3:51 AM, Iain Rauch
>>>> <groups@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>>>>> Smartd emailed me to say I have "1 Currently unreadable (pending) sectors".
>>>>> This actually happened for two disks now.
>>>>>
>>>>> I ran a check and then a repair on my array and they both gave mismatch_cnt
>>>>> of 8.
>>>>>
>>>>> I ran a long self-test on both and they completed without error with no
>>>>> errors logged. Yet the 'Current_Pending_Sector' is still 1 on both, and one
>>>>> disk also has a 'UDMA_CRC_Error_Count' of 1.
>>>>>
>>>>> I ran 'hdrecover' on both and they are both telling me "Couldn't recover
>>>>> sector 2930277168". It's asking if I want to overwrite it with zeros to fix
>>>>> it, but I would assume this will damage my array?
>>>>>
>>>>> The disk sizes are 1500301910016 bytes and I use 1500250M partition sizes
>>>>> for the array components. Does that sector fall outside my partition, and
>>>>> hence would it be safe to overwrite it with zeros?
>>>>>
>>>>> Also, why did I have a mismatch_cnt? I haven't run another check since I
>>>>> did
>>>>> the repair, as I wanted to fix the pending sector.
>>>>>
>>>>> BTW, I have a 15 drive RAID6.
>>>>>
>>>>
>>>> If you are running RAID6 and it can read from all but two drives then
>>>> it should still be able to calculate whatever would match the
>>>> remaining (presumed good) reads to fill the later two drives.  RECENT
>>>> kernels will try to write over failed sectors automatically; and only
>>>> kick the drive if the write fails.
>>>>
>>>> Please provide more information.
>>>>
>>>> Kernel version
>>>> mdadm version
>>>>
>>>> Information about how the source block devices are split up before
>>>> mdadm sees them, and any related messages from the system-log.  The
>>>> relevant section should be near the end of a dmesg output when you've
>>>> just completed a check or repair.  Your syslog probably already
>>>> captured the same data and stored it elsewhere.
>>>
>>> I thought doing the repair was supposed to fix the issue, but it didn't seem
>>> to touch it. I wonder if it is outside what md sees, but then how would it
>>> have been noticed as unreadable? And is it coincidence that both drives have
>>> the same unreadable sector?
>>>
>>> root@Edna:/home/iain# uname -a
>>> Linux Edna 2.6.28-16-server #57-Ubuntu SMP Wed Nov 11 10:34:04 UTC 2009
>>> x86_64 GNU/Linux
>>> root@Edna:/home/iain# mdadm -V
>>> mdadm - v2.6.9 - 10th March 2009
>>>
>>> I paste the end of messages below. There's loads of that all the way through
>>> doing the repair so I'm not sure how to filter out the useful bits.
>>>
>>>
>>> Iain
>>> [...]
>>
>> Hi Iain,
>>
>> the "Current_pending_sectors" is a smart attribute which gets
>> incremented during online (reading and writing sectors) AND offline
>> drive scanning (also called SMART Data Collection), when the drive finds
>> out a sector cannot be correctly read at the first try (offline data
>> collection) or after applying various error-correction techniques.
>> The easiest way to get rid of this problem: dd a sector of zeros onto
>> the broken sector, then fail the drive, re-add it.  Now wait until the
>> resync is done.
>> The fact I'm not sure about is: should one fail and re-add both drives
>> at once?  As by that the redundancy would get lost...
>>
>> Speaking about redundancy: our rule of thumb (at xtivate.de) is "each 4
>> drives need one redundancy" - so a redundancy of 2 with 15 drives is
>> kind of playing with your luck...
>>
>> Good luck,
>> Stefan
>
> Well, I failed one of the drives and allowed 'hdrecover' to overwrite the
> unreadable sector, but it still couldn't fix it. Here's its report:
>
> Wiping sector 2930277168...
> Checking sector is now readable...
> I still couldn't read the sector!
> I'm sorry, but even writing to the sector hasn't fixed it - there's nothing
> more I can do!
> Summary:
>  1 bad sectors found
>  of those 0 were recovered
>  and 1 could not be recovered and were destroyed causing data loss
>
> The 'Current_Pending_Sector' was still 1, so I dd zero onto the whole drive.
> I guess I could have just done part of it, but I suppose that verified the
> whole drive 'works'. It only took ~5 hours. Funnily enough this did fix the
> Current_pending_sectors count back to zero. Still no error reports in the
> SMART data, and 'Reallocated_Event_Count' didn't go up - shouldn't that have
> gone up to one?
>
> I re-partitoned and added it to the array and it rebuilt fine in ~12 hours.
>
> Repeated the process with the second drive and everything's back to normal.
>
> The drive that had the 'UDMA_CRC_Error_Count' still says 1, but I don't
> think I need to worry about that?
>
> In direct reply to Stefan:
>
> I think you meant to dd zeros onto the drive /after/ failing it - would have
> caused corruption otherwise?
>
> I definitely think it made sense to do one at a time.
>
> One parity drive for every four seems a bit extreme, especially when you
> have a backup (which I don't). I'm fairly happy with 15 drives in RAID 6. I
> had 24 drives before, and that did give me a few problems :p Just need to
> keep the drives healthy. (Array scrubs, SMART tests etc).
>
>
> Iain
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

-- 
Best regards,
[COOLCOLD-RIPN]
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html