Re: RAID1 scrub ignoring read errors?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/12/18 3:16 am, Phil Turmel wrote:
On 12/3/18 12:49 PM, Niklas Hambüchen wrote:
On 2018-12-03 18:35, Phil Turmel wrote:
Your drives appear to be fine.  I suspect you have a problem with other
hardware in this box.

When I repeat `smartctl -t short` on these disks, it fails at exactly the same sector:

Disk 1:
   Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
   # 1  Short offline       Completed: read failure       40%     16424         7501728
   # 2  Short offline       Completed: read failure       40%     16398         7501728

Disk 2:
   Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
   # 1  Short offline       Completed: read failure       50%     16424         1758544
   # 2  Short offline       Completed: read failure       50%     16398         1758544

Doesn't this suggest that this is not unfortunate behaviour of a power supply, but permanent damage to the disks (even if originally caused by a power or power supply problem)?


Those failures should increment your Current_Pending_Sector attribute in
those drives.  But you say those remain zero.  So I'm stumped.


Yeah, that's weird. A SMART test will abort on the first error and it always bumps the Current_Pending_Sector counts (well, it has on all my drives anyway).

Try running a read on the disk with :
dd if=/dev/sdX of=/dev/null bs=1M conv=noerror

That will read every sector (or block of 8 on a 4K drive) and will keep barging through after read failures. At the end of that you'll likely have a spray of Current_Pending_Sector(s) which will give you an indication of just how bad things are.

I'm with Phil. It sounds like a power issue. If there is a power hiccup while the drive is writing, it'll occasionally write out a corrupt sector. That turns into a URE as the ECC doesn't match, so the drive never gets a clean read. That also means it'll never auto-reallocate on read.

When you attempt to write it, it'll write cleanly and therefore leave no trace of reallocation in the SMART data (because it hasn't),

In the case of a genuine bad sector, the drive will try and write it, that will fail, so it'll try and write it elsewhere (reallocate). If the drive is truly dying it'll run out of spots to reallocate to and complain loudly. That's pretty rare however.

I have an old Toshiba laptop drive here (in a drawer) with a reallocation count in the high 4 digits, and it still has plenty to spare. The impact of that on system performance was massive as it was always seeking all over the platter to get the out of sequence reallocated sectors, but it was still working and reliable.

Regards,
Brad



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux