Hi Neil,
Since Simon has responded, let me summarize the assistance I provided per his off-list request:
On 02/14/2011 11:53 PM, NeilBrown wrote:
On Thu, 10 Feb 2011 16:16:44 +0000 Simon McNair<simonmcnair@xxxxxxxxx> wrote:
Hi all
I use a 3ware 9500-12 port sata card (JBOD) which will not work without a
128mb sodimm. The sodimm socket is flakey and the result is that the
machine occasionally crashes. Yesterday I finally gave in and put
together another
machine so that I can rsync between them. When I turned the machine
on today to set up rync, the RAID array was not gone, but corrupted.
Typical...
Presumably the old machine was called 'ubuntu' and the new machine 'proÃlox'
I built the array in Aug 2010 using the following command:
mdadm --create --verbose /dev/md0 --metadata=1.1 --level=5
--raid-devices=10 /dev/sd{b,c,d,e,f,g,h,i,j,k}1 --chunk=64
Using LVM, I did the following:
pvscan
pvcreate -M2 /dev/md0
vgcreate lvm-raid /dev/md0
vgdisplay lvm-raid
vgscan
lvscan
lvcreate -v -l 100%VG -n RAID lvm-raid
lvdisplay /dev/lvm-raid/lvm0
I then formatted using:
mkfs -t ext4 -v -m .1 -b 4096 -E stride=16,stripe-width=144
/dev/lvm-raid/RAID
This worked perfectly since I created the array. Now mdadm is coming up
with
proxmox:/dev/md# mdadm --assemble --scan --verbose
mdadm: looking for devices for further assembly
mdadm: no recogniseable superblock on /dev/md/ubuntu:0
And it seems that ubuntu:0 have been successfully assembled.
It is missing one device for some reason (sdd1) but RAID can cope with that.
3ware card is compromised, with a loose buffer memory dimm. Some of its ECC errors were caught and reported in dmesg. Its likely, based on the loose memory socket, that many multiple-bit errors got through.
[trim /]
mdadm: no uptodate device for slot 8 of /dev/md/proïlox:0
mdadm: no uptodate device for slot 9 of /dev/md/proïlox:0
mdadm: failed to add /dev/sdd1 to /dev/md/proïlox:0: Invalid argument
mdadm: /dev/md/proïlox:0 assembled from 0 drives - not enough to start
the array.
This looks like it is *after* to trying the --create command you give
below.. It is best to report things in the order they happen, else you can
confuse people (or get caught out!).
Yes, this was after.
mdadm: looking for devices for further assembly
mdadm: no recogniseable superblock on /dev/sdd
mdadm: No arrays found in config file or automatically
pvscan and vgscan show nothing.
So I tried running mdadm --create --verbose /dev/md0 --metadata=1.1
--level=5 --raid-devices=10 missing /dev/sde1 /dev/sdf1 /dev/sdg1
/dev/sdh1 /dev/sdi1 /dev/sdj1 /dev/sdk1 /dev/sdl1 /dev/sdm1 --chunk=64
as it seemed that /dev/sdd1 failed to be added to the array. This did
nothing.
It did not to nothing. It wrote a superblock to /dev/sdd1 and complained
that it couldn't write to all the others --- didn't it?
There were multiple attempts to create. One wrote to just sdd1, another succeeded with all but sdd1.
dmesg contains:
md: invalid superblock checksum on sdd1
I guess that is why sdd1 was missing from 'ubuntu:0'. Though as I cannot
tell if this happened before or after any of the various things reported
above, it is hard to be sure.
The real mystery is why 'pvscan' reports nothing.
The original array was created with mdadm v2.6.7, and had a data offset of 264 sectors. After Simon's various attempts to --create, he ended up with data offset of 2048, using mdadm v3.1.4. The mdadm -E reports he posted to the list showed the 264 offset. We didn't realize the offset had been updated until somewhat later in our troubleshooting efforts.
In any case, pvscan couldn't see the LVM signature because it wasn't there (at offset 2048).
What about
pvscan --verbose
or
blkid -p /dev/md/ubuntu:0
or even
dd of=/dev/md/ubuntu:0 count=8 | od -c
Fortunately, Simon did have a copy of his LVM configuration. With the help of dd, strings, and grep, we did locate his LVM sig at the correct location on sdd1 (for data offset 264). After a number of attempts to bypass LVM and access his single LV with dmsetup (based on his backed up configuration, on the assembled new array less sdd1), I realized that the data offset was wrong on the recreated array, and went looking for the cause. I found your git commit that changed that logic last spring, and recommended that Simon revert to the default package for his ubuntu install, which is v2.6.7.
Simon has now attempted to recreate the array with v2.6.7, but the controller is throwing too many errors to succeed, and I suggested it was too flakey to trust any further. Based on the existence of the LVM sig on sdd1, I believe Simon's data is (mostly) intact, and only needs a successful create operation with a properly functioning controller. (He might also need to perform an lvm vgcfgrestore, but he has the necessary backup file.)
A new controller is on order.
Phil