Re: help with PMP failures

Tejun Heo <htejun@xxxxxxxxx> · Wed, 18 Nov 2009 13:03:00 +0900

Hello,

11/18/2009 02:39 AM, Marc MERLIN wrote:
>> This is the actual failure.  Your 6.02 drive reported media error
>> which combined with the controller errata caused port wide failure.
>  
> Ah, I see, so it should be the one for me to focus on.
> If it hadn't had an error, everything wouldn't have gone down the toilet,
> next, right?

Yes, that's my guess.

> scsi 6:2:0:0: Direct-Access     ATA      Hitachi HDS72101 GKAO PQ: 0 ANSI: 5
> sd 6:2:0:0: [sdj] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB)
>
> If it's a media error, shouldn't it show up in the smart counters?

Does smartctl -a output shows any logged errors?

> I can't really move it to another PMP port but I have indeed had failures
> that required not just a reboot of my server but an actual power cycle
> of the drive.

Yeah, some old drives do that after abruptly aborted while executing
commands.  :-(

> Ok, so this all sounds like it's a bit fragile due to hardware issues :)
> 
> I now have to figure out if /dev/sdj has a bad sector or not.
> 
> Last time I had this happen, though I did run 
> dd if=/dev/drive of=/dev/null bs=1M
> for my 5 drives, and it ran clean.
> 
> If I had a bad sector, shouldn't it show up in Current_Pending_Sector
> and shouldn't reading the entire drive with dd fail?

I'm not sure which smart counter would be affected.  It also depends
on the firmware implementation and read errors might happen one time
but not on the next trial (if the drive for some reason didn't move
the failed sector elsewhere) or maybe the drive is continuously
developing bad sectors.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html