Re: md RAID5: Disk wrongly marked "spare", need to force re-add it

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 15-04-13 03:34, Ben Bucksch wrote:
Hey Oliver,

first off: thanks for trying to help me.

Oliver Schinagl wrote, On 15.04.2013 00:40:
Firstly, have you written anything TOO the array while resyncing? If not, chances are your array is in a reasonable shape still.

I did write to the array (in fact, I did a bonnie++, which in retrospective is very stupid, and I'm upset I did it, but hindsight is 20/20 - I assumed the array was fine at that time), BUT if you look at the "event count" of each drive, the sdl marked "spare" has an event count just 2 lower then all the others, so they are very close.

Now check the event count for all your drivers and compare. If the 'broken' drive is only a few off (1 or 2 I think i spotted below, try the following)

Exactly.

The 'spare' drive, I don't know what its status is.

According to SMART, it's just fine. Its event status is very close to the others.

Theoretically, I would assume that the resync the data written to the disk is exactly the same as it was before, so keep that in mind as a last resort.

Yes, that's my plan. My question is: HOW can I tell mdadm to use it?

mdadm --run --force -A /dev/md0 /dev/sd...

I've tried that, and it tells me the array can't be started, because I have RAID 5 with 8 drives (in normal situation), 6 good drives, and 2 spares (1 working fine, 1 with hardware failure). So, after this command, I end up in "inactive" operation mode.
Make sure to list all known 'good' devices (don't list the really broken device). --run --force should make it come up. I recently (see previous thread) had an issue aswel and I found the order of commands mattered. I may have put the wrong ones up here. Doing history | grep mdadm the last used command, and thus probably the right one was:

mdadm --assemble --run --force /dev/md0 /dev/sd[1-7].

Make sure to mdadm --stop /dev/md0 before trying to assemble it.

Now the broken drive. Check your cables!! and run smartctl on it to give smart a chance to 'fix' the drive somewhat and check its status/health. ... If it fails again (at 80% because of hardware failure) you can't re-use the broken disk. It really is broken :p

It failed twice during resync, at around the same point, and smartctl tells me it's broken, so I assume it's gone for good. (Also, the failed drive is also marked as "spare" currently.)

your very last hope, is to not use the broken drive, and 'force' the above using the earlier marked spare.

How? I haven't managed to do that, that's my whole question.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux