Re: Spare disk not becoming active

Tudor Holton <tudor@xxxxxxxxxxxxxxxxx> · Thu, 20 Dec 2012 10:19:57 +1100

I don't mean to be rude, but it's been two weeks and my system is still 
in this state.  Bump, anyone?

A thorough search of the web (before I originally posted this to the 
list) revealed nothing.  No explanation as to why this occurs seemed 
apparent, only that it's happened a number of times.  Most reports 
indicate that a complete stop of the array and reassemble fixes it, but 
I tried that and it still returned to spare. Some reports indicated my 
position but no response that seems complete.

Eventually the discussions runs to wiping the disks and starting again.  
That seems a bit drastic and I'm concerned that *one* of the disks is 
faulty but not being reported as such, and I don't want to pick the 
wrong one to wipe off the superblock.  mdadm reports no errors, but 
SMART indicates there may be a problem with the *active* disk, which is 
even more worrying because without making the spare active I can't 
remove it to test it properly.

Any ideas?

Cheers,
Tudor.

On 03/12/12 11:04, Tudor Holton wrote:
Hallo,

I'm having some trouble with an array I have that has become degraded.

I have an array with this array state:

md101 : active raid1 sdf1[0] sdb1[2](S)
      1953511936 blocks [2/1] [U_]

mdadm --detail says:

/dev/md101:
        Version : 0.90
  Creation Time : Thu Jan 13 14:34:27 2011
     Raid Level : raid1
     Array Size : 1953511936 (1863.01 GiB 2000.40 GB)
  Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 101
    Persistence : Superblock is persistent

    Update Time : Fri Nov 23 03:23:04 2012
          State : clean, degraded
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1

           UUID : 43e92a79:90295495:0a76e71e:56c99031 (local to host 
barney)
         Events : 0.2127

    Number   Major   Minor   RaidDevice State
       0       8       81        0      active sync /dev/sdf1
       1       0        0        1      removed

       2       8       17        -      spare   /dev/sdb1

If I attempt to force the spare to become active it begins to recover:
$ sudo mdadm -S /dev/md101
mdadm: stopped /dev/md101
$ sudo mdadm --assemble --force --no-degraded /dev/md101 /dev/sdf1 
/dev/sdb1
mdadm: /dev/md101 has been started with 1 drive (out of 2) and 1 spare.
$ cat /proc/mdstat
md101 : active raid1 sdf1[0] sdb1[2]
      1953511936 blocks [2/1] [U_]
      [>....................]  recovery =  0.0% (541440/1953511936) 
finish=420.8min speed=77348K/sec

This runs for the allotted time but returns to the state of spare.

Neither disk partition report errors:
$ cat /sys/block/md101/md/dev-sdf1/errors
0
$ cat /sys/block/md101/md/dev-sdb1/errors
0

Are there mdadm logs to find out why this is not recovering properly?  
How otherwise do I debug this?

Cheers,
Tudor.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html