Hi Phil, thanks again for bearing with me. > > > >>> Model: ATA ST3000DM001-9YN1 (scsi) > > Aside: This model looks familiar. I'm pretty sure these drives are > desktop models that lack scterc support. Meaning they are *not* > generally suitable for raid duty. Search the archives for combinations > of "timeout mismatch", "scterc", "URE", and "scrub" for a full > explanation. If I've guessed correctly, you *must* use the driver > timeout work-around before proceeding. > Yes I did, and smartctl showed no significant problems. The 10 year old server (supermicro enterprise grade dual Xeon with 8 GB ECC RAM) had started to create problems early January which is why I wanted to move the drives to a new server in the first place, to then transfer the data to a new set of enterprise grade disks. I had checked the memory and the disks in a burn in for several days including time out and power saving before I set up the raid 2012/2013, and did not have any issues then. One of the reasons I tend use mdadm is that I am able to utilize existing hardware to create bridging solutions until money comes in for better hardware, and moving an mdadm raid has so far never created a serious problem. > > So attached You will find hexdumps of 64k of /sda/sd[a-h]2 at sector 0 > > and 262144 which shows the superblock 1.2 on sd[fgh]2, not on sd[a-e]2, > > but may help to identify data_offset; I suspect it is 2048 on sd[a-e]2 > > and 262144 on sd[fgh]2. > > > > Jackpot! LVM2 embedded backup data at the correct location for mdadm > data offset == 262144. And on /dev/sda2, which is the only device that > should have it (first device in the raid). > > From /dev/sda2 @ 262144: > > > 00001200 76 67 5f 6e 65 64 69 67 73 30 32 20 5d 0a 69 64 |vg_nedigs02 ].id| > > 00001210 20 3d 20 22 32 4c 62 48 71 64 2d 72 67 42 9f 6e | = "2LbHqd-rgB.n| > > 00001220 45 4a 75 31 2d 32 52 36 31 2d 41 35 f5 75 2d 6e |EJu1-2R61-A5.u-n| > > 00001230 49 58 53 2d 66 79 4f 36 33 73 22 0a 73 65 3a 01 |IXS-fyO63s".se:.| > > 00001240 6f 20 3d 20 33 36 0a 66 6f 72 6d 61 ca 24 3d 20 |o = 36.forma.$= | > > 00001250 22 6c 76 6d 32 22 20 23 20 69 6e 66 6f 72 6b ac |"lvm2" # infork.| > ... > > 00001a70 20 31 33 37 35 32 38 37 39 37 39 09 23 20 d2 32 | 1375287979.# .2| > > 00001a80 64 20 4a 75 6c 20 33 31 20 31 38 3a af 37 3a 31 |d Jul 31 18:.7:1| > > 00001a90 39 20 32 30 31 33 0a 0a 00 00 00 00 00 00 ee 12 |9 2013..........| > > Note the creation date/time at the end (with a corrupted byte): > > Jul 31 18:?7:19 2013 > > There are other corrupted bytes scattered around. I'd be worried about > the RAM in this machine. Since you are using non-enterprise drives, I'm > going to go out on a limb here and guess that the server doesn't have > ECC ram... see above > Consider performing an extended memcheck run to see what's going on. > Maybe move the entire stack of disks to another server. > Thats what I did initially, moved it back because it failed, now will move again into the new server before proceeding. > Based on the signature discovered above, we should be able to --create > --assume-clean with the modern default data offset. We know the > following device roles: > > /dev/sda2 == 0 > /dev/sdf2 == 5 > /dev/sdg2 == 6 > /dev/sdh2 == spare > > So /dev/sdh2 should be left out until the array is working. > > Please re-execute the "mdadm -E" reports for /dev/sd[fgh]2 and show them > uncut. (Use the lasted mdadm.) That should fill in the likely device > order of the remaining drives. [root@livecd mnt]# mdadm -E /dev/sd[fgh]2 /dev/sdf2: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 32d82f84:fe30ac2e:f589aaef:bdd3e4c7 Name : 1 Creation Time : Wed Jul 31 18:24:38 2013 Raid Level : raid6 Raid Devices : 7 Avail Dev Size : 5858314240 (2793.46 GiB 2999.46 GB) Array Size : 29285793280 (13964.55 GiB 14994.33 GB) Used Dev Size : 5857158656 (2792.91 GiB 2998.87 GB) Data Offset : 262144 sectors Super Offset : 8 sectors State : active Device UUID : d5a16cb2:ff41b9a5:cbbf12b7:3750026d Update Time : Mon Dec 16 01:16:26 2013 Checksum : ee921c43 - correct Events : 327 Layout : left-symmetric Chunk Size : 256K Device Role : Active device 5 Array State : A.AAAAA ('A' == active, '.' == missing) /dev/sdg2: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 32d82f84:fe30ac2e:f589aaef:bdd3e4c7 Name : 1 Creation Time : Wed Jul 31 18:24:38 2013 Raid Level : raid6 Raid Devices : 7 Avail Dev Size : 5858314240 (2793.46 GiB 2999.46 GB) Array Size : 29285793280 (13964.55 GiB 14994.33 GB) Used Dev Size : 5857158656 (2792.91 GiB 2998.87 GB) Data Offset : 262144 sectors Super Offset : 8 sectors State : active Device UUID : a1e1e51b:d8912985:e51207a9:1d718292 Update Time : Mon Dec 16 01:16:26 2013 Checksum : 4ef01fe9 - correct Events : 327 Layout : left-symmetric Chunk Size : 256K Device Role : Active device 6 Array State : A.AAAAA ('A' == active, '.' == missing) /dev/sdh2: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 32d82f84:fe30ac2e:f589aaef:bdd3e4c7 Name : 1 Creation Time : Wed Jul 31 18:24:38 2013 Raid Level : raid6 Raid Devices : 7 Avail Dev Size : 5858314240 (2793.46 GiB 2999.46 GB) Array Size : 29285793280 (13964.55 GiB 14994.33 GB) Used Dev Size : 5857158656 (2792.91 GiB 2998.87 GB) Data Offset : 262144 sectors Super Offset : 8 sectors State : active Device UUID : 030cb9a7:76a48b3c:b3448369:fcf013e1 Update Time : Mon Dec 16 01:16:26 2013 Checksum : a1330e97 - correct Events : 327 Layout : left-symmetric Chunk Size : 256K Device Role : spare Array State : A.AAAAA ('A' == active, '.' == missing) > Also, it is important that you document which drive serial numbers are > currently occupying the different device names. An excerpt from "ls -l > /dev/disk/by-id/" would do. scsi-SATA_ST3000DM001-9YN_S1F026VJ -> ../../sda scsi-SATA_ST3000DM001-9YN_W1F0TB3C -> ../../sdb scsi-SATA_ST3000DM001-9YN_S1F04KAK -> ../../sdc scsi-SATA_ST3000DM001-9YN_W1F0RWJY -> ../../sdd scsi-SATA_ST3000DM001-9YN_S1F08N7Q -> ../../sde scsi-SATA_ST3000DM001-9YN_Z1F1F3TC -> ../../sdf scsi-SATA_ST3000DM001-9YN_W1F1ZZ9T -> ../../sdg scsi-SATA_ST3000DM001-9YN_Z1F1X0AC -> ../../sdh > I have to admit that I'm very concerned about your corrupted LVM > signature at offset 262144. LVM probably won't recognize your PV once > the array is assembled correctly, making it difficult to > non-destructively test the filesystems on your logical volumes. You may > have to duplicate your disks onto new ones so that an LVM restore can be > safely attempted. > Do *not* buy desktop drives! You need raid-capable drives like the WD > Red at the least. ;-) Already ordered WD reds, will be delivered any time now. I guess I have now reached that level after years of making do with very limited budgets. I am a bit more relaxed now because I found that a scheduled transfer of the data to the university tape robot had completed before christmas. So this local archive mirror is (luckily) not critical. I still want to understand whether all this is just a result of shaky hardware, or an mdadm (misuse) issue. Losing (all superblocks on) five drives in a large software raid 6 instead of bytes is not something I would like to repeat any time soon by ie. mishandling mdadm. We have then Wed Jul 31 18:24:38 2013 on sdf-h2 for creation of the raid6 and wed Jul 31 18:?7:19 2013 for creation of the lvm group could well be. So I will move the disks to the new server, make 1:1 copies to new drives and then attempt an assembly using --assume-clean in which order ? Thanks so much, I have learned a lot already. Regards Julian Universitätsklinikum Jena - Bachstrasse 18 - D-07743 Jena Die gesetzlichen Pflichtangaben finden Sie unter http://www.uniklinikum-jena.de/Pflichtangaben.html ��.n��������+%������w��{.n�����{����w��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f