Why raid456 doesn't block removing member drives?

"Tkaczyk, Mariusz" <mariusz.tkaczyk@xxxxxxxxx> · Fri, 3 Aug 2018 10:20:59 +0000

Hi all,
I tested some array failure scenarios for imsm and native metadata.  I 
saw that in raid5 we can achieve failed state. For other raid levels it 
is not possible - kernel blocks setting device as faulty in the array 
and returns EBUSY if next fail will destroy the array.
For example native raid1 :

[root@localhost ~]# mdadm -CR /dev/md/vol -l1 -n2 /dev/sd[b-c]  -z 500M
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md/vol started.

[root@localhost ~]# cat /proc/mdstat
Personalities : [raid0] [raid6] [raid5] [raid4] [raid10] [raid1]
md127 : active raid1 sdc[1] sdb[0]
         512000 blocks super 1.2 [2/2] [UU]

unused devices: <none>

Try fail it, we are able to remove one device:

[root@localhost ~]# mdadm --set-faulty /dev/md127 /dev/sdc
mdadm: set /dev/sdc faulty in /dev/md127

Try remove last device:

[root@localhost ~]# mdadm --set-faulty /dev/md127 /dev/sdb
mdadm: set device faulty failed for /dev/sdb:  Device or resource busy

Function raid1_handle_error doesn't allow to remove it.

      if (test_bit(In_sync, &rdev->flags)
           && (conf->raid_disks - mddev->degraded) == 1) {
           /*
            * Don't fail the drive, act as though we were just a
            * normal single drive.
            * However don't try a recovery from this drive as
            * it is very likely to fail.
            */
           conf->recovery_disabled = mddev->recovery_disabled;
           spin_unlock_irqrestore(&conf->device_lock, flags);
           return;
       }

There is a check- it blocks removing last working device. I would expect 
something similar for raid5.
Scenario for raid5:

[root@localhost ~]# mdadm -CR /dev/md/vol -l5 -n3 /dev/sd[b-d] -z 500M
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md/vol started.

[root@localhost ~]# cat /proc/mdstat
Personalities : [raid0] [raid6] [raid5] [raid4] [raid10] [raid1]
md127 : active raid5 sdd[2] sdc[1] sdb[0]
          1024000 blocks super 1.2 level 5, 512k chunk, algorithm 2 
[3/3] [UUU]

unused devices: <none>

Fail one device:

[root@localhost ~]# mdadm --set-faulty /dev/md127 /dev/sdb
mdadm: set /dev/sdb faulty in /dev/md127

[root@localhost ~]# cat /proc/mdstat
Personalities : [raid0] [raid6] [raid5] [raid4] [raid10] [raid1]
md127 : active raid5 sdd[2] sdc[1] sdb[0](F)
          1024000 blocks super 1.2 level 5, 512k chunk, algorithm 2 
[3/2] [_UU]

unused devices: <none>

Now try to fail array:

[root@localhost ~]# mdadm --set-faulty /dev/md127 /dev/sdc
mdadm: set /dev/sdc faulty in /dev/md127

[root@localhost ~]# cat /proc/mdstat
Personalities : [raid0] [raid6] [raid5] [raid4] [raid10] [raid1]
md127 : active raid5 sdd[2] sdc[1](F) sdb[0](F)
          1024000 blocks super 1.2 level 5, 512k chunk, algorithm 2 
[3/1] [__U]

As you see raid is failed now. I didn't observe any error from mdadm or 
kernel.
I checked array state in the sysfs:

[root@localhost ~]# cat /sys/block/md127/md/array_state
clean

I expect "failed" there. Only in Detail I saw information about failure:

[root@localhost ~]# mdadm -D /dev/md127
/dev/md127:
               Version : 1.2
         Creation Time : Wed Aug  1 13:51:42 2018
            Raid Level : raid5
            Array Size : 1024000 (1000.00 MiB 1048.58 MB)
         Used Dev Size : 512000 (500.00 MiB 524.29 MB)
          Raid Devices : 3
         Total Devices : 3

           Persistence : Superblock is persistent
             Update Time : Wed Aug  1 14:00:54 2018
                 State : clean, FAILED
        Active Devices : 1
       Working Devices : 1
        Failed Devices : 2
        Spare Devices : 0
                  Layout : left-symmetric
            Chunk Size : 512K
     Consistency Policy : resync

                    Name : localhost.localdomain:vol  (local to host
localhost.localdomain)
                  UUID : 7355721f:6e0851e7:24b30662:e31cbe0c
                Events : 21
         Number   Major   Minor   RaidDevice State
           -       0        0        0      removed
           -       0        0        1      removed
          2       8       48        2      active sync   /dev/sdd
          0       8       16        -      faulty   /dev/sdb
          1       8       32        -      faulty   /dev/sdc

This problem affects every metadata type.
I checked handle_error function in raid456 module. It is implemented in 
this way in kernel - it doesn't provide any array protection.
It is missing from the beginning of raid456 module implementation. I 
suppose there is no mistake, but I can't find any use case.
Is there any scenario where it is necessary?
I will be grateful for any advice.

Thanks,
Mariusz��.n��������+%������w��{.n�����{����w��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f