Re: Intel IMSM RAID 5 won't start

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/09/2016 04:42 AM, Guido D'Arezzo wrote:
> Thanks for your replies.
> I copied the RAID discs to a 4 TB drive with dd and there were no errors.
> Recreating the RAID according to your instructions, Artur, worked
> without a problem, after which the contents of the partitions were
> available.  The larger RAID volume, with a small boot partition and a
> big LVM partition was mainly OK.  The ext3 and ext4 file-systems in
> the logical volumes were all OK; those which were in use were fixed by
> fsck.  I was unable to repair a btrfs file-system which was in use.
> The smaller RAID volume contained LVs: several had gone and the one
> left had a new name but as they were all swap space, it doesn't matter
> to me.
> The parity repair had no apparent effect apart from starting a resync.
> 
> Sorry Wols, I don't know where the loopback/overlays thing would have
> fitted in.  Luckily I didn't need to do a (10 hour) restore from the
> disc images.  I'm very grateful that I didn't have to reinstall or
> restore everything.
> 
> Regards
> 
> Guido

Hi Guido,

That's great! I'm glad it worked and you didn't need to use the backup.

Best wishes,
Artur

> 
> On Mon, Jan 4, 2016 at 3:14 PM, Artur Paszkiewicz
> <artur.paszkiewicz@xxxxxxxxx> wrote:
>> On 01/03/2016 08:44 PM, Guido D'Arezzo wrote:
>>> Hi
>>>
>>> After 20 months trouble-free Intel IMSM RAID, I had to do a hard reset
>>> and the array has failed to start.  I don’t know if the failed RAID
>>> was the cause of the problems before the reset.  The system won’t boot
>>> because everything is on the RAID array.  Booting from a live Fedora
>>> USB shows no sign that the discs are broken and I was able to copy 1
>>> GB off each disc with dd.  I hope someone can help me to rescue the
>>> array.
>>>
>>> It is a 4 x 1 TB disc RAID 5 array.  The system was running Archlinux
>>> and I had patched it a day or 2 before for the first time in a few
>>> months, thought it had been rebooted more than once afterwards without
>>> incident.
>>>
>>> The Intel oROM says disc 2 is “Offline Member” and 3 is “Failed Disk”.
>>>
>>> -----------------------------------------------------------------------
>>> Intel(R) Rapid Storage Technology - Option ROM - 11.6.0.1702
>>>
>>> RAID Volumes:
>>> ID    Name    Level        Strip    Size    Status     Bootable
>>> O    md0    RAID5(Parity)    128KB    2.6TB    Failed    No
>>> 1    mdl    RAID5(Parity)    128KB    94.5GB    Failed    No
>>>
>>> Physical Devices:
>>> ID    Device    Model        Serial #    Size    Type/Status(Vol ID)
>>> O    WDC WD10EZEK-00K    WD-ACC1S5684189    931.5GB    Member Disk(0,1)
>>> 1    SAMSUNG HD103UJ        S13PJDAS608384    931.5GB    Member Disk(O,1)
>>> 2    SAMSUNG HD103SJ        SZ46J9GZC04Z67    931.5GB Offline Member
>>> 3    SAMSUNG HD103UJ        S13PJDAS608386    931.5GB    Unknown Disk
>>> 4    WDC WD10EZEK-08M    WD-ACC3F1681668    931.5GB    Non-RAID Disk
>>>
>>> -----------------------------------------------------------------------
>>>
>>> The 2 RAID volumes were both spread across all 4 discs.  This is how
>>> it looks now:
>>>
>>> # mdadm -D /dev/md/imsm0
>>> /dev/md/imsm0:
>>>         Version : imsm
>>>      Raid Level : container
>>>   Total Devices : 1
>>>
>>> Working Devices : 1
>>>
>>>
>>>            UUID : 76cff3f5:1a3a7a83:49fc86a8:84cf6604
>>>   Member Arrays :
>>>
>>>     Number   Major   Minor   RaidDevice
>>>
>>>        0       8       48        -        /dev/sdd
>>> #
>>>
>>> # mdadm -D /dev/md/imsm1
>>> /dev/md/imsm1:
>>>         Version : imsm
>>>      Raid Level : container
>>>   Total Devices : 3
>>>
>>> Working Devices : 3
>>>
>>>
>>>            UUID : e8286680:de9642f4:04200a4a:acbdb566
>>>   Member Arrays :
>>>
>>>     Number   Major   Minor   RaidDevice
>>>
>>>        0       8       16        -        /dev/sdb
>>>        1       8       32        -        /dev/sdc
>>>        2       8        0        -        /dev/sda
>>> #
>>>
>>> # mdadm --detail-platform
>>>  Platform : Intel(R) Matrix Storage Manager
>>>  Version : 11.6.0.1702
>>>  RAID Levels : raid0 raid1 raid10 raid5
>>>  Chunk Sizes : 4k 8k 16k 32k 64k 128k
>>>  2TB volumes : supported
>>>  2TB disks : supported
>>>  Max Disks : 6
>>>  Max Volumes : 2 per array, 4 per controller
>>>  I/O Controller : /sys/devices/pci0000:00/0000:00:1f.2 (SATA)
>>> #
>>>
>>>
>>> # mdadm --examine /dev/sd[abcd]
>>> /dev/sda:
>>>           Magic : Intel Raid ISM Cfg Sig.
>>>         Version : 1.3.00
>>>     Orig Family : d12e9b21
>>>          Family : d12e9b21
>>>      Generation : 00695bbd
>>>      Attributes : All supported
>>>            UUID : e8286680:de9642f4:04200a4a:acbdb566
>>>        Checksum : 8f6fe1cb correct
>>>     MPB Sectors : 2
>>>           Disks : 4
>>>    RAID Devices : 2
>>>
>>>   Disk01 Serial : WD-WCC1S5684189
>>>           State : active
>>>              Id : 00000000
>>>     Usable Size : 1953518862 (931.51 GiB 1000.20 GB)
>>>
>>> [md0]:
>>>            UUID : d5bf7ab7:2cda417d:f0c6542f:c77d9289
>>>      RAID Level : 5
>>>         Members : 4
>>>           Slots : [_U_U]
>>>     Failed disk : 2
>>>       This Slot : 1
>>>      Array Size : 5662310400 (2700.00 GiB 2899.10 GB)
>>>    Per Dev Size : 1887436800 (900.00 GiB 966.37 GB)
>>>   Sector Offset : 0
>>>     Num Stripes : 7372800
>>>      Chunk Size : 128 KiB
>>>        Reserved : 0
>>>   Migrate State : idle
>>>       Map State : failed
>>>     Dirty State : clean
>>>
>>> [md1]:
>>>            UUID : 26671da2:0d23f085:3d12dbbe:f63aad5a
>>>      RAID Level : 5
>>>         Members : 4
>>>           Slots : [__UU]
>>>     Failed disk : 0
>>>       This Slot : 2
>>>      Array Size : 198232064 (94.52 GiB 101.49 GB)
>>>    Per Dev Size : 66077952 (31.51 GiB 33.83 GB)
>>>   Sector Offset : 1887440896
>>>     Num Stripes : 258117
>>>      Chunk Size : 128 KiB
>>>        Reserved : 0
>>>   Migrate State : idle
>>>       Map State : failed
>>>     Dirty State : clean
>>>
>>>   Disk00 Serial : PJDWS608386:0:0
>>>           State : active
>>>              Id : ffffffff
>>>     Usable Size : 1953518862 (931.51 GiB 1000.20 GB)
>>>
>>>   Disk02 Serial : 6J9GZC04267:0:0
>>>           State : active failed
>>>              Id : ffffffff
>>>     Usable Size : 1953518862 (931.51 GiB 1000.20 GB)
>>>
>>>   Disk03 Serial : S13PJDWS608384
>>>           State : active
>>>              Id : 00000001
>>>     Usable Size : 1953518862 (931.51 GiB 1000.20 GB)
>>>
>>> /dev/sdb:
>>>           Magic : Intel Raid ISM Cfg Sig.
>>>         Version : 1.3.00
>>>     Orig Family : d12e9b21
>>>          Family : d12e9b21
>>>      Generation : 00695bbd
>>>      Attributes : All supported
>>>            UUID : e8286680:de9642f4:04200a4a:acbdb566
>>>        Checksum : 8f6fe1cb correct
>>>     MPB Sectors : 2
>>>           Disks : 4
>>>    RAID Devices : 2
>>>
>>>   Disk03 Serial : S13PJDWS608384
>>>           State : active
>>>              Id : 00000001
>>>     Usable Size : 1953518862 (931.51 GiB 1000.20 GB)
>>>
>>> [md0]:
>>>            UUID : d5bf7ab7:2cda417d:f0c6542f:c77d9289
>>>      RAID Level : 5
>>>         Members : 4
>>>           Slots : [_U_U]
>>>     Failed disk : 2
>>>       This Slot : 3
>>>      Array Size : 5662310400 (2700.00 GiB 2899.10 GB)
>>>    Per Dev Size : 1887436800 (900.00 GiB 966.37 GB)
>>>   Sector Offset : 0
>>>     Num Stripes : 7372800
>>>      Chunk Size : 128 KiB
>>>        Reserved : 0
>>>   Migrate State : idle
>>>       Map State : failed
>>>     Dirty State : clean
>>>
>>> [md1]:
>>>            UUID : 26671da2:0d23f085:3d12dbbe:f63aad5a
>>>      RAID Level : 5
>>>         Members : 4
>>>           Slots : [__UU]
>>>     Failed disk : 0
>>>       This Slot : 3
>>>      Array Size : 198232064 (94.52 GiB 101.49 GB)
>>>    Per Dev Size : 66077952 (31.51 GiB 33.83 GB)
>>>   Sector Offset : 1887440896
>>>     Num Stripes : 258117
>>>      Chunk Size : 128 KiB
>>>        Reserved : 0
>>>   Migrate State : idle
>>>       Map State : failed
>>>     Dirty State : clean
>>>
>>>   Disk00 Serial : PJDWS608386:0:0
>>>           State : active
>>>              Id : ffffffff
>>>     Usable Size : 1953518862 (931.51 GiB 1000.20 GB)
>>>
>>>   Disk01 Serial : WD-WCC1S5684189
>>>           State : active
>>>              Id : 00000000
>>>     Usable Size : 1953518862 (931.51 GiB 1000.20 GB)
>>>
>>>   Disk02 Serial : 6J9GZC04267:0:0
>>>           State : active failed
>>>              Id : ffffffff
>>>     Usable Size : 1953518862 (931.51 GiB 1000.20 GB)
>>>
>>> /dev/sdc:
>>>           Magic : Intel Raid ISM Cfg Sig.
>>>         Version : 1.3.00
>>>     Orig Family : d12e9b21
>>>          Family : d12e9b21
>>>      Generation : 00695b88
>>>      Attributes : All supported
>>>            UUID : e8286680:de9642f4:04200a4a:acbdb566
>>>        Checksum : a72daa29 correct
>>>     MPB Sectors : 2
>>>           Disks : 4
>>>    RAID Devices : 2
>>>
>>>   Disk02 Serial : S246J9GZC04267
>>>           State : active
>>>              Id : 00000002
>>>     Usable Size : 1953518862 (931.51 GiB 1000.20 GB)
>>>
>>> [md0]:
>>>            UUID : d5bf7ab7:2cda417d:f0c6542f:c77d9289
>>>      RAID Level : 5
>>>         Members : 4
>>>           Slots : [UUUU]
>>>     Failed disk : none
>>>       This Slot : 2
>>>      Array Size : 5662310400 (2700.00 GiB 2899.10 GB)
>>>    Per Dev Size : 1887436800 (900.00 GiB 966.37 GB)
>>>   Sector Offset : 0
>>>     Num Stripes : 7372800
>>>      Chunk Size : 128 KiB
>>>        Reserved : 0
>>>   Migrate State : idle
>>>       Map State : normal
>>>     Dirty State : dirty
>>>
>>> [md1]:
>>>            UUID : 26671da2:0d23f085:3d12dbbe:f63aad5a
>>>      RAID Level : 5
>>>         Members : 4
>>>           Slots : [UUUU]
>>>     Failed disk : none
>>>       This Slot : 0
>>>      Array Size : 198232064 (94.52 GiB 101.49 GB)
>>>    Per Dev Size : 66077952 (31.51 GiB 33.83 GB)
>>>   Sector Offset : 1887440896
>>>     Num Stripes : 258117
>>>      Chunk Size : 128 KiB
>>>        Reserved : 0
>>>   Migrate State : idle
>>>       Map State : normal
>>>     Dirty State : clean
>>>
>>>   Disk00 Serial : S13PJDWS608386
>>>           State : active
>>>              Id : 00000003
>>>     Usable Size : 1953518862 (931.51 GiB 1000.20 GB)
>>>
>>>   Disk01 Serial : WD-WCC1S5684189
>>>           State : active
>>>              Id : 00000000
>>>     Usable Size : 1953518862 (931.51 GiB 1000.20 GB)
>>>
>>>   Disk03 Serial : S13PJDWS608384
>>>           State : active
>>>              Id : 00000001
>>>     Usable Size : 1953518862 (931.51 GiB 1000.20 GB)
>>>
>>> /dev/sdd:
>>>           Magic : Intel Raid ISM Cfg Sig.
>>>         Version : 1.0.00
>>>     Orig Family : c7e42747
>>>          Family : c7e42747
>>>      Generation : 00000000
>>>      Attributes : All supported
>>>            UUID : 76cff3f5:1a3a7a83:49fc86a8:84cf6604
>>>        Checksum : 4f820c2e correct
>>>     MPB Sectors : 1
>>>           Disks : 1
>>>    RAID Devices : 0
>>>
>>>   Disk00 Serial : S13PJDWS608386
>>>           State :
>>>              Id : 00000003
>>>     Usable Size : 1953518862 (931.51 GiB 1000.20 GB)
>>> #
>>
>> Hi Guido,
>>
>> It looks like the metadata on the drives got messed up for some reason.
>> If you believe the drives are good, you can try recreating the arrays
>> with the same layout to write fresh metadata to the drives, without
>> overwriting the actual data. In this case it can be done like this (make
>> a backup of the drives using dd before trying it):
>>
>> # mdadm -Ss
>> # mdadm -C /dev/md/imsm0 -eimsm -n4 /dev/sdd /dev/sda /dev/sdc /dev/sdb -R
>> # mdadm -C /dev/md/md0 -l5 -n4 /dev/sdd /dev/sda /dev/sdc /dev/sdb --size=900G
>> --chunk=128 --assume-clean -R
>> # mdadm -C /dev/md/md1 -l5 -n4 /dev/sdd /dev/sda /dev/sdc /dev/sdb --chunk=128
>> --assume-clean -R
>>
>> Drives should be listed in the order as they appear in the output from
>> mdadm -E. Look at the "DiskXX Serial" lines.
>>
>> Then you can run fsck on the filesystems. Finally, repair any mismatched
>> parity blocks:
>>
>> # echo repair > /sys/block/md126/md/sync_action
>> # echo repair > /sys/block/md125/md/sync_action
>>
>> You may have to update places like fstab, bootloader config,
>> /etc/mdadm.conf, because the array UUIDs will change.
>>
>> Regards,
>> Artur
>>

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux