Re: Degraded RAID1

Curtis Vaughan <curtis@xxxxxxxxxxx> · Wed, 16 Oct 2019 14:15:53 -0700

On 10/15/19 5:04 PM, Wol's lists wrote:
> On 15/10/2019 23:44, Curtis Vaughan wrote:
>>
>>>>
>>>> Device info:
>>>> ST1000DM003-9YN162, S/N:Z1D17B24, WWN:5-000c50-050e6c90f, FW:CC4C,
>>>> 1.00 TB
>>> Urkk
>>>
>>> Seagate Barracudas are NOT recommended. Can you do a "smartctl -x" and
>>> see if SCT/ERC is supported? I haven't got a datasheet for the 1GB
>>> version, but I've got the 3GB version and it doesn't support it. That
>>> means you WILL suffer from the timeout problem ...
>>>
>>> (Not that that's your problem here, but there's no point tempting fate.
>>> I know Seagate say "suitable for desktop raid", but the experts on this
>>> list wouldn't agree ...)
>>
>> SCT supported, but SCT/ERC not. GREAT! Hm and the replacement is also
>> a Seagate.
>
> My new drives are Seagate Ironwolf, which are supposedly fine. I still
> haven't managed to boot the system - it's been sat for ages with an
> assembly problem I haven't solved - I hope it's something as simple as
> needs a bios update, but I can't do that ...
>>   However another of my servers also has Seagates like the one I'm
>> buying and it
>> that ERC is supported. So maybe I should buy one more such drive and
>> also
>> replace sdb?
>
> Depends. If you run the script on the timeout problem page it "fixes"
> the problem. The only downside is that if you have a disk error,
> you've just set your timeout to three minutes, so the system could
> freeze for near enough that time. Not nice for the user, but at least
> the system will be okay. A proper ERC drive can be set to return with
> an error very quickly - the default is 7 secs.
>>
>> Here are the results of the command on the problem drive:
>>
>> smartctl -x /dev/sda | grep SCT
>> SCT capabilities:            (0x3085)    SCT Status supported.
>> 0xe0       GPL,SL  R/W      1  SCT Command/Status
>> 0xe1       GPL,SL  R/W      1  SCT Data Transfer
>> SCT Status Version:                  3
>> SCT Version (vendor specific):       522 (0x020a)
>> SCT Support Level:                   1
>> SCT Data Table command not supported
>> SCT Error Recovery Control command not supported
>>
> Typical Barracuda :-(

Think I got it working, just want to make sure I did this right. Using
fdisk I recreated the exact same partitions on sda as on sdb.

Then I ran the mdadm --re-add for each partition to each raid volume. So
now here are some outputs to various commands. Does everything look right?

cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5]
[raid4] [raid10]
md0 : active raid1 sda1[2] sdb1[1]
      7811008 blocks [2/1] [_U]
          resync=DELAYED

md1 : active raid1 sda2[2] sdb2[1]
      968949696 blocks [2/1] [_U]
      [>....................]  recovery =  0.4% (4015552/968949696)
finish=184.6min speed=87083K/sec

unused devices: <none>

mdadm --detail /dev/md0
/dev/md0:
           Version : 0.90
     Creation Time : Wed Jul 18 15:00:44 2012
        Raid Level : raid1
        Array Size : 7811008 (7.45 GiB 8.00 GB)
     Used Dev Size : 7811008 (7.45 GiB 8.00 GB)
      Raid Devices : 2
     Total Devices : 2
   Preferred Minor : 0
       Persistence : Superblock is persistent

       Update Time : Wed Oct 16 14:10:46 2019
             State : clean, degraded, resyncing (DELAYED)
    Active Devices : 1
   Working Devices : 2
    Failed Devices : 0
     Spare Devices : 1

Consistency Policy : resync

              UUID : 7414ac79:580af0ce:e6bbe02b:915fa44a
            Events : 0.1081

    Number   Major   Minor   RaidDevice State
       2       8        1        0      spare rebuilding   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1

mdadm --detail /dev/md1
/dev/md1:
           Version : 0.90
     Creation Time : Wed Jul 18 15:00:53 2012
        Raid Level : raid1
        Array Size : 968949696 (924.06 GiB 992.20 GB)
     Used Dev Size : 968949696 (924.06 GiB 992.20 GB)
      Raid Devices : 2
     Total Devices : 2
   Preferred Minor : 1
       Persistence : Superblock is persistent

       Update Time : Wed Oct 16 14:12:20 2019
             State : clean, degraded, recovering
    Active Devices : 1
   Working Devices : 2
    Failed Devices : 0
     Spare Devices : 1

Consistency Policy : resync

    Rebuild Status : 1% complete

              UUID : ac37ca92:939d7053:3b802bf3:08298597
            Events : 0.131712

    Number   Major   Minor   RaidDevice State
       2       8        2        0      spare rebuilding   /dev/sda2
       1       8       18        1      active sync   /dev/sdb2