Re: Raid6 recovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Glenn,

{Convention on kernel.org lists is to interleave replies or bottom post, and to trim non-relevant quoted material. Please do so in the future.}

On 3/21/20 7:54 AM, Glenn Greibesland wrote:
Yes, I am aware of the problems with WD Green and multiple partitions
on single 4TB disk. I am in the middle of getting rid of old disks and
I have enough new drives to stop having multiple partitions on single
drives, but not enough power and free SATA ports. It is just a
temporary solution. Also a reason why I did not
include much details in the original post, I knew it would just
distract from the problem I want to solve right away.

What I need help with now is just getting the array started with the
16 out of 18 disks. Then I can continue migrating data and replacing
old disks as planned.

I've examined the material posted, and the sequence of events described. The --re-add damaged that one drive's role record and there is no programmatic way in mdadm to correct it.

Since you seem comfortable reading source code, you might consider byte editing that drive's superblock to restore it to "active device 10". That is what I would do. With that corrected, --assemble --force should give you a running array.

In lieu of superblock surgery, you will indeed need to perform a --create --assume-clean, as you proposed in your original email. Since you have already constructed a syntactically valid command for that purpose, with appropriate data offsets, that might be the fastest way to get a running array.

I would double-check the /dev/ name versus array "active device" number relationship to ensure strict ordering in your --create operation. Incorrect ordering will utterly scramble your content.

When I built the array in 2012, I used WD Green. They turned out to be
horrible disks and I have since replaced some of them with WD Red. The
newest disks I've bought are Ironwolves

I also noted the drives with Error Recovery Control turned off. That is not an issue while your array has no redundancy, but is catastrophic in any normal array. It is as bad as having a drive that doesn't do ERC at all. Don't do that. Do read the "Timeout Mismatch" documentation that Anthony recommended, if you haven't yet.

I also recommend, when you get to a running array, that you prioritize the backup of its content--get the critical data copied out ASAP. Your array will be very vulnerable to Unrecoverable Read Errors until you've completed your reconfiguration onto new drives. Do not attempt to scrub the array or read every file right away, as any URE may break the array again.

If UREs do break your array again, you will need to use an error-ignoring copy tool (some flavor of ddrescue) to put the readable data onto a new device, remove the old device from the system, and then --assemble --force with the replacement. Repeat as needed.

Good luck!

Regards,

Phil



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux