latest update. A bit long I'm afraid...
hi, card installed and devices plugged in. Now have
sda sda2 sdb1 sdc1 sde sdf1 sdg1 sdi sdj1 sdk1 sdl1 sdm1
sda1 sdb sdc sdd sdf sdg sdh sdj sdk sdl sdm
proxmox:/home/simon# ./lsdrv.sh
Controller device @ pci0000:00/0000:00:01.0/0000:01:00.0 [mvsas]
SCSI storage controller: Marvell Technology Group Ltd.
MV64460/64461/64462 Sys tem
Controller, Revision B (rev 01)
host4: /dev/sdf ATA Hitachi HDS72101 {SN: GTA000PAGABXRA}
host4: /dev/sdg ATA Hitachi HDS72101 {SN: GTA000PAGAA5DA}
host4: /dev/sdh ATA Hitachi HDS72101 {SN: GTA000PAG9NL9A}
host4: /dev/sdi ATA Hitachi HDS72101 {SN: GTA000PAGA8V4A}
host4: /dev/sdj ATA Hitachi HDS72101 {SN: GTD000PAGMT9GD}
host4: /dev/sdk ATA Hitachi HDS72101 {SN: GTG000PAG18BJC}
host4: /dev/sdl ATA Hitachi HDS72101 {SN: GTG000PAG1DPLC}
host4: /dev/sdm ATA Hitachi HDS72101 {SN: GTA000PAG7WMEA}
Controller device @
pci0000:00/0000:00:1a.7/usb1/1-4/1-4.1/1-4.1.1/1-4.1.1:1.0
[ usb-storage]
Bus 001 Device 007: ID 0424:2228 Standard Microsystems Corp. 9-in-2
Card Reade r {SN: 08050920003A}
host9: /dev/sdd Generic Flash HS-CF
host9: /dev/sde Generic Flash HS-COMBO
Controller device @ pci0000:00/0000:00:1c.4/0000:04:00.0 [ahci]
SATA controller: JMicron Technology Corp. JMB362/JMB363 Serial ATA
Controller (rev 03)
host7: [Empty]
host8: [Empty]
Controller device @ pci0000:00/0000:00:1c.4/0000:04:00.1 [pata_jmicron]
IDE interface: JMicron Technology Corp. JMB362/JMB363 Serial ATA
Controller (r ev 03)
host5: [Empty]
host6: [Empty]
Controller device @ pci0000:00/0000:00:1f.2 [ata_piix]
IDE interface: Intel Corporation 82801JI (ICH10 Family) 4 port SATA
IDE Contro ller #1
host0: /dev/sda ATA STM3500418AS {SN: 9VM3QJ5C}
host1: /dev/sr0 Optiarc DVD RW AD-5240S
Controller device @ pci0000:00/0000:00:1f.5 [ata_piix]
IDE interface: Intel Corporation 82801JI (ICH10 Family) 2 port SATA
IDE Contro ller #2
host2: /dev/sdb ATA Hitachi HDS72101 {SN: GTD000PAGMT8DD}
host3: /dev/sdc ATA Hitachi HDS72101 {SN: GTG000PAG04V0C}
proxmox:/home/simon# parted -l
Model: ATA STM3500418AS (scsi)
Disk /dev/sda: 500GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Number Start End Size Type File system Flags
1 32.8kB 537MB 537MB primary ext3 boot
2 537MB 500GB 500GB primary lvm
Error: The backup GPT table is not at the end of the disk, as it should
be. This might mean that another operating
system believes the disk is smaller. Fix, by moving the backup to the
end (and removing the old backup)?
Fix/Cancel? c
Error: The backup GPT table is not at the end of the disk, as it should
be. This might mean that another operating
system believes the disk is smaller. Fix, by moving the backup to the
end (and removing the old backup)?
Fix/Cancel? c
Error: The backup GPT table is not at the end of the disk, as it should
be. This might mean that another operating
system believes the disk is smaller. Fix, by moving the backup to the
end (and removing the old backup)?
Fix/Cancel? c
Model: Linux device-mapper (linear) (dm)
Disk /dev/mapper/pve-data: 380GB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Number Start End Size File system Flags
1 0.00B 380GB 380GB ext3
cModel: Linux device-mapper (linear) (dm)
Disk /dev/mapper/pve-root: 103GB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Number Start End Size File system Flags
1 0.00B 103GB 103GB ext3
Model: Linux device-mapper (linear) (dm)
Disk /dev/mapper/pve-swap: 11.8GB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Number Start End Size File system Flags
1 0.00B 11.8GB 11.8GB linux-swap
Error: The backup GPT table is not at the end of the disk, as it should
be. This might mean that another operating
system believes the disk is smaller. Fix, by moving the backup to the
end (and removing the old backup)?
Fix/Cancel? c
Error: /dev/sdh: unrecognised disk label
Error: /dev/sdi: unrecognised disk label
Error: The backup GPT table is not at the end of the disk, as it should
be. This might mean that another operating
system believes the disk is smaller. Fix, by moving the backup to the
end (and removing the old backup)?
Fix/Cancel? c
Error: The backup GPT table is not at the end of the disk, as it should
be. This might mean that another operating
system believes the disk is smaller. Fix, by moving the backup to the
end (and removing the old backup)?
Fix/Cancel? c
Error: The backup GPT table is not at the end of the disk, as it should
be. This might mean that another operating
system believes the disk is smaller. Fix, by moving the backup to the
end (and removing the old backup)?
Fix/Cancel? c
Error: The backup GPT table is not at the end of the disk, as it should
be. This might mean that another operating
system believes the disk is smaller. Fix, by moving the backup to the
end (and removing the old backup)?
Fix/Cancel? c
Error: /dev/md0: unrecognised disk label
proxmox:/home/simon# mdadm --assemble --scan --verbose
mdadm: looking for devices for /dev/md/0
mdadm: cannot open device /dev/dm-2: Device or resource busy
mdadm: /dev/dm-2 has wrong uuid.
mdadm: cannot open device /dev/dm-1: Device or resource busy
mdadm: /dev/dm-1 has wrong uuid.
mdadm: cannot open device /dev/dm-0: Device or resource busy
mdadm: /dev/dm-0 has wrong uuid.
mdadm: no RAID superblock on /dev/sdf1
mdadm: /dev/sdf1 has wrong uuid.
mdadm: no RAID superblock on /dev/sdf
mdadm: /dev/sdf has wrong uuid.
mdadm: no RAID superblock on /dev/sdm1
mdadm: /dev/sdm1 has wrong uuid.
mdadm: no RAID superblock on /dev/sdm
mdadm: /dev/sdm has wrong uuid.
mdadm: no RAID superblock on /dev/sdl1
mdadm: /dev/sdl1 has wrong uuid.
mdadm: no RAID superblock on /dev/sdl
mdadm: /dev/sdl has wrong uuid.
mdadm: no RAID superblock on /dev/sdk1
mdadm: /dev/sdk1 has wrong uuid.
mdadm: no RAID superblock on /dev/sdk
mdadm: /dev/sdk has wrong uuid.
mdadm: no RAID superblock on /dev/sdj1
mdadm: /dev/sdj1 has wrong uuid.
mdadm: no RAID superblock on /dev/sdj
mdadm: /dev/sdj has wrong uuid.
mdadm: /dev/sdi has wrong uuid.
mdadm: /dev/sdh has wrong uuid.
mdadm: no RAID superblock on /dev/sdg1
mdadm: /dev/sdg1 has wrong uuid.
mdadm: no RAID superblock on /dev/sdg
mdadm: /dev/sdg has wrong uuid.
mdadm: no RAID superblock on /dev/sdc1
mdadm: /dev/sdc1 has wrong uuid.
mdadm: no RAID superblock on /dev/sdc
mdadm: /dev/sdc has wrong uuid.
mdadm: no RAID superblock on /dev/sdb1
mdadm: /dev/sdb1 has wrong uuid.
mdadm: no RAID superblock on /dev/sdb
mdadm: /dev/sdb has wrong uuid.
mdadm: cannot open device /dev/sda2: Device or resource busy
mdadm: /dev/sda2 has wrong uuid.
mdadm: cannot open device /dev/sda1: Device or resource busy
mdadm: /dev/sda1 has wrong uuid.
mdadm: cannot open device /dev/sda: Device or resource busy
mdadm: /dev/sda has wrong uuid.
mdadm: no devices found for /dev/md/0
mdadm: looking for devices for further assembly
mdadm: cannot open device /dev/dm-2: Device or resource busy
mdadm: cannot open device /dev/dm-1: Device or resource busy
mdadm: cannot open device /dev/dm-0: Device or resource busy
mdadm: no recogniseable superblock on /dev/sdf1
mdadm: no recogniseable superblock on /dev/sdf
mdadm: no recogniseable superblock on /dev/sdm1
mdadm: no recogniseable superblock on /dev/sdm
mdadm: no recogniseable superblock on /dev/sdl1
mdadm: no recogniseable superblock on /dev/sdl
mdadm: no recogniseable superblock on /dev/sdk1
mdadm: no recogniseable superblock on /dev/sdk
mdadm: no recogniseable superblock on /dev/sdj1
mdadm: no recogniseable superblock on /dev/sdj
mdadm: /dev/sdi is not built for host proxmox.
mdadm: /dev/sdh is not built for host proxmox.
mdadm: no recogniseable superblock on /dev/sdg1
mdadm: no recogniseable superblock on /dev/sdg
mdadm: no recogniseable superblock on /dev/sdc1
mdadm: no recogniseable superblock on /dev/sdc
mdadm: no recogniseable superblock on /dev/sdb1
mdadm: no recogniseable superblock on /dev/sdb
mdadm: cannot open device /dev/sda2: Device or resource busy
mdadm: cannot open device /dev/sda1: Device or resource busy
mdadm: cannot open device /dev/sda: Device or resource busy
proxmox:/home/simon# apt-show-versions -a mdadm
mdadm 2.6.7.2-3 install ok installed
mdadm 2.6.7.2-3 lenny ftp.uk.debian.org
No stable version
No testing version
mdadm 3.1.4-1+8efb9d1 sid ftp.uk.debian.org
mdadm/lenny uptodate 2.6.7.2-3
anything else you want ?
Simon
On 15/02/2011 14:51, Phil Turmel wrote:
Hi Neil,
Since Simon has responded, let me summarize the assistance I provided per his off-list request:
On 02/14/2011 11:53 PM, NeilBrown wrote:
On Thu, 10 Feb 2011 16:16:44 +0000 Simon McNair<simonmcnair@xxxxxxxxx> wrote:
Hi all
I use a 3ware 9500-12 port sata card (JBOD) which will not work without a
128mb sodimm. The sodimm socket is flakey and the result is that the
machine occasionally crashes. Yesterday I finally gave in and put
together another
machine so that I can rsync between them. When I turned the machine
on today to set up rync, the RAID array was not gone, but corrupted.
Typical...
Presumably the old machine was called 'ubuntu' and the new machine 'proÃlox'
I built the array in Aug 2010 using the following command:
mdadm --create --verbose /dev/md0 --metadata=1.1 --level=5
--raid-devices=10 /dev/sd{b,c,d,e,f,g,h,i,j,k}1 --chunk=64
Using LVM, I did the following:
pvscan
pvcreate -M2 /dev/md0
vgcreate lvm-raid /dev/md0
vgdisplay lvm-raid
vgscan
lvscan
lvcreate -v -l 100%VG -n RAID lvm-raid
lvdisplay /dev/lvm-raid/lvm0
I then formatted using:
mkfs -t ext4 -v -m .1 -b 4096 -E stride=16,stripe-width=144
/dev/lvm-raid/RAID
This worked perfectly since I created the array. Now mdadm is coming up
with
proxmox:/dev/md# mdadm --assemble --scan --verbose
mdadm: looking for devices for further assembly
mdadm: no recogniseable superblock on /dev/md/ubuntu:0
And it seems that ubuntu:0 have been successfully assembled.
It is missing one device for some reason (sdd1) but RAID can cope with that.
3ware card is compromised, with a loose buffer memory dimm. Some of its ECC errors were caught and reported in dmesg. Its likely, based on the loose memory socket, that many multiple-bit errors got through.
[trim /]
mdadm: no uptodate device for slot 8 of /dev/md/proïlox:0
mdadm: no uptodate device for slot 9 of /dev/md/proïlox:0
mdadm: failed to add /dev/sdd1 to /dev/md/proïlox:0: Invalid argument
mdadm: /dev/md/proïlox:0 assembled from 0 drives - not enough to start
the array.
This looks like it is *after* to trying the --create command you give
below.. It is best to report things in the order they happen, else you can
confuse people (or get caught out!).
Yes, this was after.
mdadm: looking for devices for further assembly
mdadm: no recogniseable superblock on /dev/sdd
mdadm: No arrays found in config file or automatically
pvscan and vgscan show nothing.
So I tried running mdadm --create --verbose /dev/md0 --metadata=1.1
--level=5 --raid-devices=10 missing /dev/sde1 /dev/sdf1 /dev/sdg1
/dev/sdh1 /dev/sdi1 /dev/sdj1 /dev/sdk1 /dev/sdl1 /dev/sdm1 --chunk=64
as it seemed that /dev/sdd1 failed to be added to the array. This did
nothing.
It did not to nothing. It wrote a superblock to /dev/sdd1 and complained
that it couldn't write to all the others --- didn't it?
There were multiple attempts to create. One wrote to just sdd1, another succeeded with all but sdd1.
dmesg contains:
md: invalid superblock checksum on sdd1
I guess that is why sdd1 was missing from 'ubuntu:0'. Though as I cannot
tell if this happened before or after any of the various things reported
above, it is hard to be sure.
The real mystery is why 'pvscan' reports nothing.
The original array was created with mdadm v2.6.7, and had a data offset of 264 sectors. After Simon's various attempts to --create, he ended up with data offset of 2048, using mdadm v3.1.4. The mdadm -E reports he posted to the list showed the 264 offset. We didn't realize the offset had been updated until somewhat later in our troubleshooting efforts.
In any case, pvscan couldn't see the LVM signature because it wasn't there (at offset 2048).
What about
pvscan --verbose
or
blkid -p /dev/md/ubuntu:0
or even
dd of=/dev/md/ubuntu:0 count=8 | od -c
Fortunately, Simon did have a copy of his LVM configuration. With the help of dd, strings, and grep, we did locate his LVM sig at the correct location on sdd1 (for data offset 264). After a number of attempts to bypass LVM and access his single LV with dmsetup (based on his backed up configuration, on the assembled new array less sdd1), I realized that the data offset was wrong on the recreated array, and went looking for the cause. I found your git commit that changed that logic last spring, and recommended that Simon revert to the default package for his ubuntu install, which is v2.6.7.
Simon has now attempted to recreate the array with v2.6.7, but the controller is throwing too many errors to succeed, and I suggested it was too flakey to trust any further. Based on the existence of the LVM sig on sdd1, I believe Simon's data is (mostly) intact, and only needs a successful create operation with a properly functioning controller. (He might also need to perform an lvm vgcfgrestore, but he has the necessary backup file.)
A new controller is on order.
Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html