Re: mdadm raid5 single drive fail, single drive out of sync terror

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Good morning Jon,

On 11/26/2014 10:08 AM, Jon Robison wrote:
> Hi all!
> 
> I upgraded to mdadm-3.3-7.fc20.x86_64, and my raid5 array would no
> longer recognize /dev/sdb1 in my raid 5 array (which is normally
> /dev/sd[b-f]1). I `mdadm --detail --scan`,  which resulted in a degraded
> array, then added /dev/sdb1, and it started rebuilding happily until 25%
> or so, when another failure seemed to occur.

Well, failures during rebuild of a raid5 are common.  In my experience,
including helping on this list, most often due to timeout mismatch and a
failure to regularly scrub.

> I am convinced the data is fine on /dev/sd[c-f]1, and that somehow I
> just need to inform mdadm about that, but they got out of sync and
> /dev/sde1 thinks the array is AAAAA while the others think its AAA.. .
> The drives also seem to think e is bad because f said e was bad or some
> weird stuff, and sde1 is behind by ~50 events or so. That error hasn't
> shown itself recently. I fear sdb is bad and sde is going to go soon.

Please show your dmesg from the start of the problem.  Also show
"smartctl -x /dev/sdX" for each of the member devices.  Also show an
excerpt from "ls -l /dev/disk/by-id/" that shows the device vs. serial
number relationship for your drives.

> Results of `mdadm --examine /dev/sd[b-f]1` are here
> http://dpaste.com/2Z7CPVY

Just put the results in the email in the future.  Kernel.org tolerates
relatively large messages.

> I'm scared and alone. Everything is off and sitting as above, though e
> 50 events behind and out of synch. New drives coming Friday and backup
> is of course a bit old. I'm petrified to execute `mdadm --create
> --assume-clean --level=5 --raid-devices=5 /dev/md0 /dev/sdf1 /dev/sdd1
> /dev/sdc1 /dev/sde1 missing`,

You should be petrified of any '--create' operation.  What you've shown
above would certainly *not* work, thanks to your data offsets.

> but that seems my next option unless ya'll
> know better. I tried `mdadm --assemble -f /dev/md0 /dev/sdf1 /dev/sdd1
> /dev/sdc1 /dev/sde1` and it said something like can't start with only 3
> devices (which I wouldn't expect because examine still shows 4, just
> that they are out of sync and I thought that was -f's express purpose in
> assemble mode). Anyone have any suggestions? Thanks!

Show the contents of /proc/mdstat, then show the results of:

mdadm --stop /dev/md0
mdadm --assemble --force --verbose /dev/md0 /dev/sd[cdef]1

Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux