On 7/12/18 3:06 am, Brad Campbell wrote:
On 6/12/18 10:33 pm, Niklas Hambüchen wrote:
On 2018-12-04 01:27, Brad Campbell wrote:
Try running a read on the disk with :
dd if=/dev/sdX of=/dev/null bs=1M conv=noerror
Hey Brad, thanks for your reply!
I first tried reading only around the first problematic sector 1758544.
First the one directly before it:
# dd bs=512 if=/dev/sdb of=/dev/null skip=1758543 count=1
1+0 records in
1+0 records out
512 bytes copied, 0,00713634 s, 71,7 kB/s
Now the problematic sector:
# dd bs=512 if=/dev/sdb of=/dev/null skip=1758544 count=1
dd: error reading '/dev/sdb': Input/output error
0+0 records in
0+0 records out
0 bytes copied, 7,00467 s, 0,0 kB/s
Error after 7 seconds, seems like timeouts are working as expected.
After I did so, I got in smartctl:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
...
197 Current_Pending_Sector 0x0032 200 200 000 Old_age
Always - 1
So that seems to work as expected.
Why did it not increase when the RAID1 scrub had the read failures
though?
That is puzzling, but if I've learned one thing about drives and
SMART, it's that implementations are inconsistent from manufacturer,
drive family and even firmware versions. You just can't rely on it.
Puzzling also as to why md didn't re-write that sector when it found a
read error. I have it do that from time to time on RAID-6.
I am now running the dd you suggested on the whole disk, which will
take a couple hours.
That'll just highlight any other duff sectors that might be after the
one that triggers the SMART test failure.
Recovery:
Also I'd like to ask what my recovery strategy should be.
My current understanding is that some sectors are unreadable on sda
and some unreadable on sdb.
As per explanations so far, these can be fixed by re-writing from the
corresponding other devices.
Now, sda seems to be truly broken, given that the RAID scrub reported
that the write failed.
Yeah, a write error isn't good. I'd be replacing that drive yesterday.
This means that if I replace sda by a new disk first, I will not be
able to recover unreadable sectors on sdb (via copies from sda,
because it'd be gone).
Ideally I would be able to first fix all unreadable sectors on sdb by
copying the relevant sectors from sda.
But I don't know if that's possible, because it seems the scrub stops
at the first write error to sdb.
What should I do?
Personally (and granting that my methods are most likely less than
optimal)?
If you are serious about replacing your drives (or sda at least), I'd
get a third disk, create a new RAID-1 from the new disk with one drive
missing, copy the data from the old RAID to the new RAID and then add
the old sdb to it. I'd be inclined to write zeros to the entire drive
first to force a reallocation on any pending sectors, even though the
RAID rebuild will do most of the disk anyway.
Wouldn't a mdadm replace solve this?
https://unix.stackexchange.com/questions/74924/how-to-safely-replace-a-not-yet-failed-disk-in-a-linux-raid5-array
The system will copy all readable blocks from |sdd1| to |sdc1|. If it
comes to an unreadable block, it will reconstruct it from parity. Once
the operation is complete, the former spare (here: |sdc1|) will become
active, and the failing drive will be marked as failed (F) so you can
remove it.
Which sounds exactly what you want to do...
If you are serious about keeping your redundancy, then two new drives
into a new RAID-1 and copy the data.
Drives are cheap. Backups are cheap. Data recovery is expensive.
Agreed!
Regards,
Adam
--
Adam Goryachev Website Managers www.websitemanagers.com.au
--
The information in this e-mail is confidential and may be legally privileged.
It is intended solely for the addressee. Access to this e-mail by anyone else
is unauthorised. If you are not the intended recipient, any disclosure,
copying, distribution or any action taken or omitted to be taken in reliance
on it, is prohibited and may be unlawful. If you have received this message
in error, please notify us immediately. Please also destroy and delete the
message from your computer. Viruses - Any loss/damage incurred by receiving
this email is not the sender's responsibility.