Hello, 11/18/2009 02:39 AM, Marc MERLIN wrote: >> This is the actual failure. Your 6.02 drive reported media error >> which combined with the controller errata caused port wide failure. > > Ah, I see, so it should be the one for me to focus on. > If it hadn't had an error, everything wouldn't have gone down the toilet, > next, right? Yes, that's my guess. > scsi 6:2:0:0: Direct-Access ATA Hitachi HDS72101 GKAO PQ: 0 ANSI: 5 > sd 6:2:0:0: [sdj] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB) > > If it's a media error, shouldn't it show up in the smart counters? Does smartctl -a output shows any logged errors? > I can't really move it to another PMP port but I have indeed had failures > that required not just a reboot of my server but an actual power cycle > of the drive. Yeah, some old drives do that after abruptly aborted while executing commands. :-( > Ok, so this all sounds like it's a bit fragile due to hardware issues :) > > I now have to figure out if /dev/sdj has a bad sector or not. > > Last time I had this happen, though I did run > dd if=/dev/drive of=/dev/null bs=1M > for my 5 drives, and it ran clean. > > If I had a bad sector, shouldn't it show up in Current_Pending_Sector > and shouldn't reading the entire drive with dd fail? I'm not sure which smart counter would be affected. It also depends on the firmware implementation and read errors might happen one time but not on the next trial (if the drive for some reason didn't move the failed sector elsewhere) or maybe the drive is continuously developing bad sectors. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html