Yes I forgot to mention that I already did check the smart status on each drive which is the reason I went and checked the sata cables. My controllers might be playing tricks on me, as I have noticed a devices dropping out of the arrays before without any reasons I could find. I might have to get some different controllers and maybe bigger/fewer drives soon. Rebuilding a 14 drive array is slow and dangerous. Is there any rule of thumb about how many drives you should have maximum in a raid5/raid6 array? Thanks for the tip about /dev/sd[abc]1, somehow you forget all of those things when you are stressed out of loosing all your data. :) On Sun, Sep 8, 2013 at 12:05 AM, Roger Heflin <rogerheflin@xxxxxxxxx> wrote: > run smartctl --all /dev/sdX were you replace X with each of the devices. > > Watch for reported uncorrected and or other errors coming from various > disks. especially on the ones that had an issue. > > And check /var/log/messages to see what the sequence of events was on > the failure. > > If you have certain controllers (some marvell's, probably some others) > under certain conditions they will lock up and drop all devices on the > given card, if a disk behaves badly some controllers will also have > issues that can result in other ports on the same device going away. > > And also this works nicely without having to specify all devices > explicitly...for /dev/sda1 /dev/sdb1 /dev/sdc1 this will work > /dev/sd[abc]1 and is much easier to type so long as all of the devices > are on partition 1. > > On Sat, Sep 7, 2013 at 6:49 PM, Garðar Arnarsson <gardar@xxxxxxxxxxx> wrote: >> I seem to have been able to resolve this. >> >> I tried force assembling the array with all the drives except for sda1 >> (the problematic device before) that way the array got assembled with >> 12 drives and one spare, enough so I could recover the array. >> >> Still would want to know what might have caused these problems for the >> first place, but I'm glad it seems to be working ok for now. >> >> On Sat, Sep 7, 2013 at 10:56 PM, Garðar Arnarsson <gardar@xxxxxxxxxxx> wrote: >>> I am unable to start my raid6 array. >>> >>> At first I got sda1 missing so I went ahead and re-added it and was >>> able to start the array. Then few minutes later I got 3 failed drives. >>> >>> I shut down my server, checked all the sata and power cables and >>> booted up again. >>> The array did not start automatically so I tried to force assemble it. >>> >>> sudo mdadm --assemble --verbose --force /dev/md1 /dev/sda1 /dev/sdb1 >>> /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1 >>> /dev/sdj1 /dev/sdk1 /dev/sdm1 /dev/sdp1 /dev/sdq1 >>> mdadm: looking for devices for /dev/md1 >>> mdadm: /dev/sda1 is identified as a member of /dev/md1, slot 15. >>> mdadm: /dev/sdb1 is identified as a member of /dev/md1, slot 8. >>> mdadm: /dev/sdc1 is identified as a member of /dev/md1, slot 1. >>> mdadm: /dev/sdd1 is identified as a member of /dev/md1, slot 3. >>> mdadm: /dev/sde1 is identified as a member of /dev/md1, slot 6. >>> mdadm: /dev/sdf1 is identified as a member of /dev/md1, slot 9. >>> mdadm: /dev/sdg1 is identified as a member of /dev/md1, slot 10. >>> mdadm: /dev/sdh1 is identified as a member of /dev/md1, slot 0. >>> mdadm: /dev/sdi1 is identified as a member of /dev/md1, slot 11. >>> mdadm: /dev/sdj1 is identified as a member of /dev/md1, slot 7. >>> mdadm: /dev/sdk1 is identified as a member of /dev/md1, slot 5. >>> mdadm: /dev/sdm1 is identified as a member of /dev/md1, slot 14. >>> mdadm: /dev/sdp1 is identified as a member of /dev/md1, slot 2. >>> mdadm: /dev/sdq1 is identified as a member of /dev/md1, slot 4. >>> mdadm: ignoring /dev/sdq1 as it reports /dev/sda1 as failed >>> mdadm: added /dev/sdc1 to /dev/md1 as 1 >>> mdadm: added /dev/sdp1 to /dev/md1 as 2 >>> mdadm: added /dev/sdd1 to /dev/md1 as 3 >>> mdadm: no uptodate device for slot 4 of /dev/md1 >>> mdadm: added /dev/sdk1 to /dev/md1 as 5 >>> mdadm: added /dev/sde1 to /dev/md1 as 6 >>> mdadm: added /dev/sdj1 to /dev/md1 as 7 >>> mdadm: added /dev/sdb1 to /dev/md1 as 8 >>> mdadm: added /dev/sdf1 to /dev/md1 as 9 >>> mdadm: added /dev/sdg1 to /dev/md1 as 10 >>> mdadm: added /dev/sdi1 to /dev/md1 as 11 >>> mdadm: no uptodate device for slot 12 of /dev/md1 >>> mdadm: no uptodate device for slot 13 of /dev/md1 >>> mdadm: added /dev/sdm1 to /dev/md1 as 14 >>> mdadm: added /dev/sda1 to /dev/md1 as 15 >>> mdadm: added /dev/sdh1 to /dev/md1 as 0 >>> mdadm: /dev/md1 assembled from 11 drives and 2 spares - not enough to >>> start the array. >>> >>> Any ideas what could have gone wrong and how I can possibly start the >>> array again? >> >> >> >> -- >> Garðar Arnarsson >> kerfisstjóri Giraffi sf. >> gardar@xxxxxxxxxxx >> http://gardar.giraffi.net >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- Garðar Arnarsson kerfisstjóri Giraffi sf. gardar@xxxxxxxxxxx http://gardar.giraffi.net -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html