Swapped out the new drive for the old. The new drive is still labelled as /dev/sdf, hopefully.. I decided to check times before proceeding, and to make sure the drives were in the right order. I corrected them to go by `mdadm --examine`'s output as best I could. Here's the output of `mdadm --examine /dev/sdf` and the result of executing the given `./mdadm` command (with re-ordered drives), The binary compiled from git sources crashed with a segmentation fault while attempting to print out a failure writing the superblock. I've tried the drives (with proper sizes) in other combinations, according to both what you posted and what mdadm --examine says the "Device Role" is. I haven't found a working combination; is it possible my drives got swapped around on reboot? There's a re-run of mdadm --examine at the end of my post. root@leyline:~/mdadm# mdadm --examine /dev/sdf /dev/sdf: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 9759ad94:75e30b6b:8a726b4d:177a6eda Name : leyline:1 (local to host leyline) Creation Time : Mon Sep 12 13:19:00 2011 Raid Level : raid5 Raid Devices : 5 Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB) Array Size : 7813048320 (7451.10 GiB 8000.56 GB) Used Dev Size : 3906524160 (1862.78 GiB 2000.14 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : 2edc16c6:cf45ad32:04b026a4:956ce78b Update Time : Fri Jun 1 03:11:54 2012 Checksum : b3e49e59 - correct Events : 2127454 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 2 Array State : AAAA. ('A' == active, '.' == missing) root@leyline:~/mdadm# ./mdadm -C /dev/md1 -e 1.2 -l 5 -n 5 --assume-clean -c 512 /dev/sdc3:2048s /dev/sdf:2048s /dev/sdb3:2048s /dev/sdd3:2048s /dev/sde3:1024s mdadm: /dev/sdc3 appears to be part of a raid array: level=raid5 devices=5 ctime=Tue Jun 5 00:10:46 2012 mdadm: /dev/sdf appears to contain an ext2fs file system size=242788K mtime=Fri Oct 7 16:55:40 2011 mdadm: /dev/sdf appears to be part of a raid array: level=raid5 devices=5 ctime=Mon Sep 12 13:19:00 2011 mdadm: /dev/sdb3 appears to be part of a raid array: level=raid5 devices=5 ctime=Tue Jun 5 00:10:46 2012 mdadm: /dev/sdd3 appears to be part of a raid array: level=raid5 devices=5 ctime=Tue Jun 5 00:10:46 2012 Continue creating array? yes Segmentation fault Since I couldn't find any fault with running it again (but I am not a smart man, or I would not be in this position), I decided to run valgrind over it: ==3206== Memcheck, a memory error detector ==3206== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al. ==3206== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for copyright info ==3206== Command: ./mdadm -C /dev/md1 -e 1.2 -l 5 -n 5 --assume-clean -c 512 /dev/sdc3:2048s /dev/sdf:2048s /dev/sdb3:2048s /dev/sdd3:2048s /dev/sde3:1024s ==3206== ==3206== Warning: noted but unhandled ioctl 0x1261 with no size/direction hints ==3206== This could cause spurious value errors to appear. ==3206== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper. ==3206== Warning: noted but unhandled ioctl 0x1261 with no size/direction hints ==3206== This could cause spurious value errors to appear. ==3206== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper. ==3206== Warning: noted but unhandled ioctl 0x1261 with no size/direction hints ==3206== This could cause spurious value errors to appear. ==3206== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper. mdadm: /dev/sdc3 appears to be part of a raid array: level=raid5 devices=5 ctime=Tue Jun 5 00:14:20 2012 mdadm: /dev/sdf appears to contain an ext2fs file system size=242788K mtime=Fri Oct 7 16:55:40 2011 mdadm: /dev/sdf appears to be part of a raid array: level=raid5 devices=5 ctime=Tue Jun 5 00:14:20 2012 mdadm: /dev/sdb3 appears to be part of a raid array: level=raid5 devices=5 ctime=Tue Jun 5 00:14:20 2012 mdadm: /dev/sdd3 appears to be part of a raid array: level=raid5 devices=5 ctime=Tue Jun 5 00:14:20 2012 Continue creating array? ==3206== Invalid read of size 8 ==3206== at 0x43C9B7: write_init_super1 (super1.c:1327) ==3206== by 0x41F1B9: Create (Create.c:951) ==3206== by 0x407231: main (mdadm.c:1464) ==3206== Address 0x8 is not stack'd, malloc'd or (recently) free'd ==3206== ==3206== ==3206== Process terminating with default action of signal 11 (SIGSEGV) ==3206== Access not within mapped region at address 0x8 ==3206== at 0x43C9B7: write_init_super1 (super1.c:1327) ==3206== by 0x41F1B9: Create (Create.c:951) ==3206== by 0x407231: main (mdadm.c:1464) ==3206== If you believe this happened as a result of a stack ==3206== overflow in your program's main thread (unlikely but ==3206== possible), you can try to increase the size of the ==3206== main thread stack using the --main-stacksize= flag. ==3206== The main thread stack size used in this run was 8388608. ==3206== ==3206== HEAP SUMMARY: ==3206== in use at exit: 37,033 bytes in 350 blocks ==3206== total heap usage: 673 allocs, 323 frees, 4,735,171 bytes allocated ==3206== ==3206== LEAK SUMMARY: ==3206== definitely lost: 832 bytes in 8 blocks ==3206== indirectly lost: 18,464 bytes in 4 blocks ==3206== possibly lost: 0 bytes in 0 blocks ==3206== still reachable: 17,737 bytes in 338 blocks ==3206== suppressed: 0 bytes in 0 blocks ==3206== Rerun with --leak-check=full to see details of leaked memory ==3206== ==3206== For counts of detected and suppressed errors, rerun with: -v ==3206== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 4 from 4) mdadm --examine of all my drives (again): /dev/sdb3: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 9ed0b17f:e9a7a813:a9139679:4f8f999b Name : leyline:1 (local to host leyline) Creation Time : Tue Jun 5 00:14:35 2012 Raid Level : raid5 Raid Devices : 5 Avail Dev Size : 3906525098 (1862.78 GiB 2000.14 GB) Array Size : 7813046272 (7451.10 GiB 8000.56 GB) Used Dev Size : 3906523136 (1862.78 GiB 2000.14 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : feb94069:b2afeb6e:ae6b2af2:f9e3cee4 Update Time : Tue Jun 5 00:14:35 2012 Checksum : 1909d79e - correct Events : 0 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 2 Array State : AAAAA ('A' == active, '.' == missing) /dev/sdc3: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 9ed0b17f:e9a7a813:a9139679:4f8f999b Name : leyline:1 (local to host leyline) Creation Time : Tue Jun 5 00:14:35 2012 Raid Level : raid5 Raid Devices : 5 Avail Dev Size : 3906525098 (1862.78 GiB 2000.14 GB) Array Size : 7813046272 (7451.10 GiB 8000.56 GB) Used Dev Size : 3906523136 (1862.78 GiB 2000.14 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : ed6116ac:4f91c2dd:4ada53df:0e14fc2a Update Time : Tue Jun 5 00:14:35 2012 Checksum : fe0cffd8 - correct Events : 0 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 0 Array State : AAAAA ('A' == active, '.' == missing) /dev/sdd3: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 9ed0b17f:e9a7a813:a9139679:4f8f999b Name : leyline:1 (local to host leyline) Creation Time : Tue Jun 5 00:14:35 2012 Raid Level : raid5 Raid Devices : 5 Avail Dev Size : 3906523136 (1862.78 GiB 2000.14 GB) Array Size : 7813046272 (7451.10 GiB 8000.56 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : d839ca02:1d14cde3:65b54275:8caa0275 Update Time : Tue Jun 5 00:14:35 2012 Checksum : 3ac0c483 - correct Events : 0 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 3 Array State : AAAAA ('A' == active, '.' == missing) mdadm: No md superblock detected on /dev/sde3. On Mon, Jun 4, 2012 at 5:57 PM, NeilBrown <neilb@xxxxxxx> wrote: > On Mon, 04 Jun 2012 20:26:05 +0200 Pierre Beck <mail@xxxxxxxxxxxxxx> wrote: > >> I'll try and clear up some confusion (I was in IRC with freeone3000). >> >> /dev/sdf is an empty drive, a replacement for a failed drive. The Array >> attempted to assemble, but failed and reported one drive as spare. This >> is the moment we saved the --examines. >> >> In expectation of a lost write due to drive write-cache, we executed >> --assemble --force, which kicked another drive. >> >> @James: remove /dev/sdf for now and replace /dev/sde3, which indeed has >> a very outdated update time, with the non-present drive. Post an >> --examine of that drive. It should report update time Jun 1st. >> >> We tried to re-create the array with --assume-clean. But mdadm chose a >> different data offset for the drives. A re-create with proper data >> offset will be necessary. > > OK, try: > > git clone -b data_offset git://neil.brown.name/mdadm > cd mdadm > make > > ./mdadm -C /dev/md1 -e 1.2 -l 5 -n 5 --assume-clean -c 512 \ > /dev/sdc3:2048s /dev/sdb3:2048s ??? /dev/sdd3:1024s ??? > > The number after ':' after a device name is a data offset. 's' means sectors. > With out 's' it means Kilobytes. > I don't know what should be at slot 2 or 4 so I put '???'. You should fill it > in. You should also double check the command and double check the names of > your devices. > Don't install this mdadm, and don't use it for anything other than > re-creating this array. > > Good luck. > > NeilBrown > >> >> Greetings, >> >> Pierre Beck >> >> >> Am 04.06.2012 05:35, schrieb NeilBrown: >> > On Fri, 1 Jun 2012 19:48:41 -0500 freeone3000<freeone3000@xxxxxxxxx> wrote: >> > >> >> Sorry. >> >> >> >> /dev/sde fell out of the array, so I replaced the physical drive with >> >> what is now /dev/sdf. udev may have relabelled the drive - smartctl >> >> states that the drive that is now /dev/sde works fine. >> >> /dev/sdf is a new drive. /dev/sdf has a single, whole-disk partition >> >> with type marked as raid. It is physically larger than the others. >> >> >> >> /dev/sdf1 doesn't have a mdadm superblock. /dev/sdf seems to, so I >> >> gave output of that device instead of /dev/sdf1, despite the >> >> partition. Whole-drive RAID is fine, if it gets it working. >> >> >> >> What I'm attempting to do is rebuild the RAID from the data from the >> >> other four drives, and bring the RAID back up without losing any of >> >> the data. /dev/sdb3, /dev/sdc3, /dev/sdd3, and what is now /dev/sde3 >> >> should be used to rebuild the array, with /dev/sdf as a new drive. If >> >> I can get the array back up with all my data and all five drives in >> >> use, I'll be very happy. >> > You appear to have 3 devices that are happy: >> > sdc3 is device 0 data-offset 2048 >> > sdb3 is device 1 data-offset 2048 >> > sdd3 is device 3 data-offset 1024 >> > >> > nothing claims to be device 2 or 4. >> > >> > sde3 looks like it was last in the array on 23rd May, a little over >> > a week before your report. Could that have been when "sde fell out of the >> > array" ?? >> > Is it possible that you replaced the wrong device? >> > Or is it possible the the array was degraded when sde "fell out" resulting >> > in data loss? >> > >> > I need more precise history to understand what happened, as I cannot suggest >> > a fixed until I have that understanding. >> > >> > When did the array fail? >> > How certain are you that you replaced the correct device? >> > Can you examine the drive that you removed and see what it says? >> > Are you certain that the array wasn't already degraded? >> > >> > NeilBrown >> > > -- James Moore -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html