Re: Upgrade from Ubuntu 10.04 to 12.04 broken raid6.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/1/2012 8:40 AM, Phil Turmel wrote:
Hi EJ,

On 09/30/2012 07:23 PM, EJ Vincent wrote:
On 9/30/2012 4:28 PM, Phil Turmel wrote:
Do you have *any* dmesg output from the old system?  Or dmesg from the
very first boot under 12.04?  That might have enough information to
shorten your search.

In the future, you should record your setup by saving the output of
"mdadm -D" on each array, "mdadm -E" on each member device, and the
output of "ls -l /dev/disk/by-id/"

Or try my documentation script "lsdrv". [1]

HTH,

Phil

[1] http://github.com/pturmel/lsdrv
Hi Phil,

Unfortunately I don't have any dmesg log from the old system or the
first boot under 12.04.

Getting my system to boot at all under 12.04 was chaotic enough, with
the overly-aggressive /usr/share/initramfs-tools/scripts/mdadm-functions
ravaging my array and then dropping me to a busybox shell over and over
again.  I didn't think to record the very first error.
I'm not prepared to condemn the 12.04 initramfs--I really don't think it
is a factor in this crisis.  The critical part is the degraded reboot bug.

Here's an observation of mine, disks: /dev/sdb1, /dev/sdi1, and
/dev/sdj1 don't have the Raid level "-unknown-", neither are they
labeled as spares.  They are in fact, labeled clean and appear
*different* from the others.

Could these disks still contain my metadata from 10.04?  I recall during
my installation of 12.04 I had anywhere from 1 to 3 disks unpowered, so
that I could drop in a SATA CD/DVDRW into the slot.
Leaving disks unpowered sounds like a key factor in your crisis.  Raid6
can't operate with more than two missing, and won't assemble if any disk
disappears between shutdown and the next boot.  (Must be forced.)

So your array would only partially assemble under 12.04 due to
deliberately missing drives, then you rebooted with a kernel that has a
problem with that scenario.

The disks very likely do have useful metadata, but no disk has all of
it.  It might reduce the permutations you need to try.  If you share
more information about your system layout, some educated first guesses
might be possible, too.  The output of "mdadm -E" for every drive, and
lsdrv for an overview.

I am downloading 10.04.4 LTS and will be ready to use it soon.  I fear
having to do permutations-- 9! (factorial) would mean 362,880
combinations.  *gasp*
Phil

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

On 10/1/2012 8:40 AM, Phil Turmel wrote:
Hi EJ,

On 09/30/2012 07:23 PM, EJ Vincent wrote:
On 9/30/2012 4:28 PM, Phil Turmel wrote:
Do you have *any* dmesg output from the old system?  Or dmesg from the
very first boot under 12.04?  That might have enough information to
shorten your search.

In the future, you should record your setup by saving the output of
"mdadm -D" on each array, "mdadm -E" on each member device, and the
output of "ls -l /dev/disk/by-id/"

Or try my documentation script "lsdrv". [1]

HTH,

Phil

[1] http://github.com/pturmel/lsdrv
Hi Phil,

Unfortunately I don't have any dmesg log from the old system or the
first boot under 12.04.

Getting my system to boot at all under 12.04 was chaotic enough, with
the overly-aggressive /usr/share/initramfs-tools/scripts/mdadm-functions
ravaging my array and then dropping me to a busybox shell over and over
again.  I didn't think to record the very first error.
I'm not prepared to condemn the 12.04 initramfs--I really don't think it
is a factor in this crisis.  The critical part is the degraded reboot bug.

Here's an observation of mine, disks: /dev/sdb1, /dev/sdi1, and
/dev/sdj1 don't have the Raid level "-unknown-", neither are they
labeled as spares.  They are in fact, labeled clean and appear
*different* from the others.

Could these disks still contain my metadata from 10.04?  I recall during
my installation of 12.04 I had anywhere from 1 to 3 disks unpowered, so
that I could drop in a SATA CD/DVDRW into the slot.
Leaving disks unpowered sounds like a key factor in your crisis.  Raid6
can't operate with more than two missing, and won't assemble if any disk
disappears between shutdown and the next boot.  (Must be forced.)

So your array would only partially assemble under 12.04 due to
deliberately missing drives, then you rebooted with a kernel that has a
problem with that scenario.

The disks very likely do have useful metadata, but no disk has all of
it.  It might reduce the permutations you need to try.  If you share
more information about your system layout, some educated first guesses
might be possible, too.  The output of "mdadm -E" for every drive, and
lsdrv for an overview.

I am downloading 10.04.4 LTS and will be ready to use it soon.  I fear
having to do permutations-- 9! (factorial) would mean 362,880
combinations.  *gasp*
Phil

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Hi Phil,

Here's the information you requested.

The server has 10 disks, a dedicated 500GB disk for the operating system (which Ubuntu 10.04.4 has labeled /dev/sdd), and 9 x 2TB disks (/dev/sd[a,b,c,e,f,g,h,i,j):

Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
Disk /dev/sdb: 2000.4 GB, 2000398934016 bytes
Disk /dev/sdc: 2000.4 GB, 2000398934016 bytes
Disk /dev/sdd: 500.1 GB, 500107862016 bytes
Disk /dev/sde: 2000.4 GB, 2000398934016 bytes
Disk /dev/sdf: 2000.4 GB, 2000398934016 bytes
Disk /dev/sdg: 2000.4 GB, 2000398934016 bytes
Disk /dev/sdh: 2000.4 GB, 2000398934016 bytes
Disk /dev/sdi: 2000.4 GB, 2000398934016 bytes
Disk /dev/sdj: 2000.4 GB, 2000398934016 bytes

The devices are spread amongst an on-board SATA controller, MCP78S GeForce AHCI, and two SiI 3124 PCI-X SATA controllers.

The layout is as follows: 5 disks are attached to the on-board controller, 3 attached to one SiI 3124 controller, and 2 attached to the other SiI 3124 controller.

I've loaded your lsdrv script, here are the results:

PCI [pata_amd] 00:06.0 IDE interface: nVidia Corporation MCP78S [GeForce 8200] IDE (rev a1)
scsi 0:x:x:x [Empty]
scsi 1:x:x:x [Empty]

PCI [sata_sil24] 06:04.0 RAID bus controller: Silicon Image, Inc. SiI 3124 PCI-X Serial ATA Controller (rev 02)
scsi 2:0:0:0 ATA ST2000DL003-9VT1
sda 1.82t [8:0] Empty/Unknown
 sda1 1.82t [8:1] Empty/Unknown
scsi 5:0:0:0 ATA ST2000DL003-9VT1
sdb 1.82t [8:16] Empty/Unknown
 sdb1 1.82t [8:17] Empty/Unknown
scsi 7:0:0:0 ATA ST2000DL003-9VT1
sdc 1.82t [8:32] Empty/Unknown
 sdc1 1.82t [8:33] Empty/Unknown
scsi 9:x:x:x [Empty]

PCI [ahci] 00:09.0 SATA controller: nVidia Corporation MCP78S [GeForce 8200] AHCI Controller (rev a2)
scsi 3:0:0:0 ATA WDC WD5000AAKS-2
sdd 465.76g [8:48] Empty/Unknown
 sdd1 237.00m [8:49] Empty/Unknown
 Mounted as /dev/sdd1 @ /boot
 sdd2 3.73g [8:50] Empty/Unknown
 sdd3 23.28g [8:51] Empty/Unknown
 Mounted as /dev/disk/by-uuid/65a128d3-3e2e-487a-a36b-11cbe5530429 @ /
 sdd4 438.52g [8:52] Empty/Unknown
scsi 4:0:0:0 ATA ST2000DL003-9VT1
sde 1.82t [8:64] Empty/Unknown
 sde1 1.82t [8:65] Empty/Unknown
scsi 6:0:0:0 ATA ST32000542AS
sdf 1.82t [8:80] Empty/Unknown
 sdf1 1.82t [8:81] Empty/Unknown
scsi 8:0:0:0 ATA ST32000542AS
sdg 1.82t [8:96] Empty/Unknown
 sdg1 1.82t [8:97] Empty/Unknown
scsi 10:0:0:0 ATA ST2000DL003-9VT1
sdh 1.82t [8:112] Empty/Unknown
 sdh1 1.82t [8:113] Empty/Unknown
scsi 11:x:x:x [Empty]

PCI [sata_sil24] 08:04.0 RAID bus controller: Silicon Image, Inc. SiI 3124 PCI-X Serial ATA Controller (rev 02)
scsi 12:0:0:0 ATA ST2000DL003-9VT1
sdi 1.82t [8:128] Empty/Unknown
 sdi1 1.82t [8:129] Empty/Unknown
scsi 13:0:0:0 ATA ST2000DL003-9VT1
sdj 1.82t [8:144] Empty/Unknown
 sdj1 1.82t [8:145] Empty/Unknown
scsi 14:x:x:x [Empty]
scsi 15:x:x:x [Empty]

Here is what mdadm -E looks like for each member of the array, now under Ubuntu 10.04.4:

# mdadm -E /dev/sda1
/dev/sda1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
           Name : ruby:6  (local to host ruby)
  Creation Time : Mon Apr 11 15:40:25 2011
     Raid Level : -unknown-
   Raid Devices : 0

 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
    Data Offset : 272 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 6190765b:200ff748:d50a75e3:597405c4

    Update Time : Sun Sep 30 19:13:16 2012
       Checksum : 37454049 - correct
         Events : 1


Array Slot : 4 (empty, empty, failed, failed, empty, failed, empty, failed, empty, failed, failed, empty, failed... <shortened for readability>)
   Array State :  378 failed

# mdadm -E /dev/sdb1
/dev/sdb1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
           Name : ruby:6  (local to host ruby)
  Creation Time : Mon Apr 11 15:40:25 2011
     Raid Level : -unknown-
   Raid Devices : 0

 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
    Data Offset : 272 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 7d707598:a8881376:531ae0c6:aac82909

    Update Time : Sun Sep 30 19:13:16 2012
       Checksum : c9effdc2 - correct
         Events : 1


Array Slot : 11 (empty, empty, failed, failed, empty, failed, empty, failed, empty, failed, failed, empty, failed... <shortened for readability>)
   Array State :  378 failed

# mdadm -E /dev/sdc1
/dev/sdc1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
           Name : ruby:6  (local to host ruby)
  Creation Time : Mon Apr 11 15:40:25 2011
     Raid Level : raid6
   Raid Devices : 9

 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
     Array Size : 27349181440 (13041.11 GiB 14002.78 GB)
  Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
    Data Offset : 272 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : a6fd99b2:7bb75287:5d844ec5:822b6d8a

    Update Time : Sun Sep 30 00:34:27 2012
       Checksum : 760485cb - correct
         Events : 2474296

     Chunk Size : 512K

    Array Slot : 7 (0, 1, failed, failed, 2, failed, 4, 5, 6, 7, 8, 3)
   Array State : uuuuuUuuu 3 failed

# mdadm -E /dev/sde1
/dev/sde1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
           Name : ruby:6  (local to host ruby)
  Creation Time : Mon Apr 11 15:40:25 2011
     Raid Level : -unknown-
   Raid Devices : 0

 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
    Data Offset : 272 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 179691a0:fd201c2d:49c73803:409a0a9c

    Update Time : Sun Sep 30 19:13:16 2012
       Checksum : 584e3a3a - correct
         Events : 1


Array Slot : 8 (empty, empty, failed, failed, empty, failed, empty, failed, empty, failed, failed, empty, failed... <shortened for readability>)
   Array State :  378 failed

# mdadm -E /dev/sdf1
/dev/sdf1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
           Name : ruby:6  (local to host ruby)
  Creation Time : Mon Apr 11 15:40:25 2011
     Raid Level : -unknown-
   Raid Devices : 0

 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
    Data Offset : 272 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : f3f72549:8543972f:1f4a655d:fa9416bd

    Update Time : Sun Sep 30 19:13:16 2012
       Checksum : 7e963c27 - correct
         Events : 1


Array Slot : 1 (empty, empty, failed, failed, empty, failed, empty, failed, empty, failed, failed, empty, failed... <shortened for readability>)
   Array State :  378 failed

# mdadm -E /dev/sdg1
/dev/sdg1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
           Name : ruby:6  (local to host ruby)
  Creation Time : Mon Apr 11 15:40:25 2011
     Raid Level : -unknown-
   Raid Devices : 0

 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
    Data Offset : 272 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 9c908e4b:ad7d8af8:ff5d2ab6:50b013e5

    Update Time : Sun Sep 30 19:13:16 2012
       Checksum : cab43e2e - correct
         Events : 1


Array Slot : 0 (empty, empty, failed, failed, empty, failed, empty, failed, empty, failed, failed, empty, failed... <shortened for readability>)
   Array State :  378 failed

# mdadm -E /dev/sdh1
/dev/sdh1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
           Name : ruby:6  (local to host ruby)
  Creation Time : Mon Apr 11 15:40:25 2011
     Raid Level : -unknown-
   Raid Devices : 0

 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
    Data Offset : 272 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 321368f6:9f38bc16:76f787c3:4b3d398d

    Update Time : Sun Sep 30 19:13:16 2012
       Checksum : 4942a22e - correct
         Events : 1


Array Slot : 6 (empty, empty, failed, failed, empty, failed, empty, failed, empty, failed, failed, empty, failed... <shortened for readability>)
   Array State :  378 failed

# mdadm -E /dev/sdi1
/dev/sdi1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
           Name : ruby:6  (local to host ruby)
  Creation Time : Mon Apr 11 15:40:25 2011
     Raid Level : raid6
   Raid Devices : 9

 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
     Array Size : 27349181440 (13041.11 GiB 14002.78 GB)
  Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
    Data Offset : 272 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 9d53248b:1db27ffc:a2a511c3:7176a7eb

    Update Time : Sun Sep 30 00:34:27 2012
       Checksum : 22b9429c - correct
         Events : 2474296

     Chunk Size : 512K

    Array Slot : 10 (0, 1, failed, failed, 2, failed, 4, 5, 6, 7, 8, 3)
   Array State : uuuuuuuuU 3 failed

# mdadm -E /dev/sdj1
/dev/sdj1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 321fc20c:997e9a1a:bb67ffde:9de489f5
           Name : ruby:6  (local to host ruby)
  Creation Time : Mon Apr 11 15:40:25 2011
     Raid Level : raid6
   Raid Devices : 9

 Avail Dev Size : 3907026672 (1863.02 GiB 2000.40 GB)
     Array Size : 27349181440 (13041.11 GiB 14002.78 GB)
  Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
    Data Offset : 272 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 880ed7fb:b9c673de:929d14c5:53f9b81d

    Update Time : Sun Sep 30 00:34:27 2012
       Checksum : a9748cf3 - correct
         Events : 2474296

     Chunk Size : 512K

    Array Slot : 9 (0, 1, failed, failed, 2, failed, 4, 5, 6, 7, 8, 3)
   Array State : uuuuuuuUu 3 failed

I'd be happy to also supply a dump of 'lshw' which I believe is similar to 'lsdrv' if that would be useful to you. The system is back on 10.04.4 LTS, and is using mdadm version 2.6.7.1.

Thanks for your continued input and assistance.  Much appreciated.

-EJ


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux