Re: [RFC] wrong behavior of re-adding a device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 19, 2018 at 5:09 PM Gi-Oh Kim <gi-oh.kim@xxxxxxxxxxxxxxxx> wrote:
>
> Hi,
>
> I found a weird behavior of re-adding a device.
> I think it is a kernel bug.
> I would appreciate it if somebody can confirm if it is a bug or feature.
>
> I tested re-adding a device as following.
> 1. create md with ram0 and ram1
> 2. add ram2
> 3. grow raid-device number to 3
> 4. remove ram2
> 5. grow raid-device number to 2
> 6. add ram2
> 7. ram0 become faulty and ram2 become active
> 8. stop md
> 9. assemble md with ram0 and ram1 => fail because ram0 is faulty

Hi,

I checked the kernel function raid1_spare_active() in raid1.c and
found out ram0 is set as faulty on purpose.
If ram0 is set as fauly to replace it with ram2, i think it should be
successful to assemble ram1 and ram2.
But "mdadm -A /dev/md111 /dev/ram1 /dev/ram2" creates md111 with only ram2.
I do not understand why it is necessary to set ram0 faulty.

How can I re-add ram2 device as the spare device without setting ram0 faulty?


> AND following is the test result.
>
> gohkim@ws00837:~/work$ uname -a
> Linux ws00837 4.13.0-16-generic #19-Ubuntu SMP Wed Oct 11 18:35:14 UTC
> 2017 x86_64 x86_64 x86_64 GNU/Linux
> gohkim@ws00837:~/work$ mdadm --version
> mdadm - v4.0 - 2017-01-09
> gohkim@ws00837:~/work$ sudo bash a.sh
> mdadm: array /dev/md111 started.
> 1+0 records in
> 1+0 records out
> 512 bytes copied, 0,000224124 s, 2,3 MB/s
> 1+0 records in
> 1+0 records out
> 512 bytes copied, 0,00013107 s, 3,9 MB/s
> mdadm: added /dev/ram2
> raid_disks for /dev/md111 set to 3
> Personalities : [raid1]
> md111 : active raid1 ram2[2] ram1[1] ram0[0]
>       65408 blocks super 1.2 [3/3] [UUU]
>       bitmap: 1/1 pages [4KB], 65536KB chunk
>
> unused devices: <none>
> mdadm: set /dev/ram2 faulty in /dev/md111
> mdadm: hot removed /dev/ram2 from /dev/md111
> raid_disks for /dev/md111 set to 2
> Personalities : [raid1]
> md111 : active raid1 ram1[1] ram0[0]
>       65408 blocks super 1.2 [2/2] [UU]
>       bitmap: 1/1 pages [4KB], 65536KB chunk
>
> unused devices: <none>
> [793537.684995] md: md111 stopped.
> [793541.015580] md/raid1:md111: not clean -- starting background reconstruction
> [793541.015581] md/raid1:md111: active with 2 out of 2 mirrors
> [793541.015607] md111: detected capacity change from 0 to 66977792
> [793541.015688] md: resync of RAID array md111
> [793541.103960] md: md111: resync done.
> [793541.133570] md: recovery of RAID array md111
> [793541.151415] md: md111: recovery done.
> [793541.153893] md/raid1:md111: Disk failure on ram2, disabling device.
>                 md/raid1:md111: Operation continuing on 2 devices.
> mdadm: re-added /dev/ram2
> gohkim@ws00837:~/work$ cat /proc/mdstat
> Personalities : [raid1]
> md111 : active raid1 ram2[2] ram1[1] ram0[0](F)
>       65408 blocks super 1.2 [2/2] [UU]
>       bitmap: 1/1 pages [4KB], 65536KB chunk
>
> unused devices: <none>
> gohkim@ws00837:~/work$ cat /proc/mdstat
> Personalities : [raid1]
> md111 : active raid1 ram2[2] ram1[1] ram0[0](F)
>       65408 blocks super 1.2 [2/2] [UU]
>       bitmap: 0/1 pages [0KB], 65536KB chunk
>
> unused devices: <none>
> gohkim@ws00837:~/work$ sudo mdadm -S /dev/md111
> mdadm: stopped /dev/md111
> gohkim@ws00837:~/work$ sudo mdadm -A /dev/md111 /dev/ram2 /dev/ram1
> mdadm: ignoring /dev/ram1 as it reports /dev/ram2 as failed
> mdadm: device 4 in /dev/md111 has wrong state in superblock, but
> /dev/ram2 seems ok
> mdadm: /dev/md111 assembled from 0 drives and 1 spare - not enough to
> start the array.
> gohkim@ws00837:~/work$ cat /proc/mdstat
> Personalities : [raid1]
> unused devices: <none>
> gohkim@ws00837:~/work$ sudo mdadm -A /dev/md111 /dev/ram1 /dev/ram2
> mdadm: device 4 in /dev/md111 has wrong state in superblock, but
> /dev/ram2 seems ok
> mdadm: /dev/md111 has been started with 1 drive (out of 2) and 1 spare.
> gohkim@ws00837:~/work$ cat /proc/mdstat
> Personalities : [raid1]
> md111 : active raid1 ram1[1] ram2[2]
>       65408 blocks super 1.2 [2/2] [_U]
>       bitmap: 0/1 pages [0KB], 65536KB chunk
>
> unused devices: <none>
> gohkim@ws00837:~/work$ sudo mdadm -D /dev/md111
> /dev/md111:
>         Version : 1.2
>   Creation Time : Wed Sep 19 16:13:02 2018
>      Raid Level : raid1
>      Array Size : 65408 (63.88 MiB 66.98 MB)
>   Used Dev Size : 65408 (63.88 MiB 66.98 MB)
>    Raid Devices : 2
>   Total Devices : 2
>     Persistence : Superblock is persistent
>
>   Intent Bitmap : Internal
>
>     Update Time : Wed Sep 19 16:21:37 2018
>           State : clean, degraded
>  Active Devices : 2
> Working Devices : 2
>  Failed Devices : 0
>   Spare Devices : 0
>
>            Name : ws00837:111  (local to host ws00837)
>            UUID : 0dc14c32:21382069:c1bcffe5:30720a7f
>          Events : 53
>
>     Number   Major   Minor   RaidDevice State
>        -       0        0        0      removed
>        1       1        1        1      active sync   /dev/ram1
>
>        2       1        2        2      active sync   /dev/ram2
> gohkim@ws00837:~/work$ sudo mdadm -S /dev/md111
> [sudo] password for gohkim:
> mdadm: stopped /dev/md111
> gohkim@ws00837:~/work$ sudo mdadm -A /dev/md111 /dev/ram0 /dev/ram1
> mdadm: /dev/md111 has been started with 1 drive (out of 2).
> gohkim@ws00837:~/work$ sudo mdadm -D /dev/md111
> /dev/md111:
>         Version : 1.2
>   Creation Time : Wed Sep 19 16:13:02 2018
>      Raid Level : raid1
>      Array Size : 65408 (63.88 MiB 66.98 MB)
>   Used Dev Size : 65408 (63.88 MiB 66.98 MB)
>    Raid Devices : 2
>   Total Devices : 1
>     Persistence : Superblock is persistent
>
>   Intent Bitmap : Internal
>
>     Update Time : Wed Sep 19 16:21:37 2018
>           State : clean, degraded
>  Active Devices : 1
> Working Devices : 1
>  Failed Devices : 0
>   Spare Devices : 0
>
>            Name : ws00837:111  (local to host ws00837)
>            UUID : 0dc14c32:21382069:c1bcffe5:30720a7f
>          Events : 53
>
>     Number   Major   Minor   RaidDevice State
>        -       0        0        0      removed
>        1       1        1        1      active sync   /dev/ram1
>




-- 
GIOH KIM
Linux Kernel Entwickler

ProfitBricks GmbH
Greifswalder Str. 207
D - 10405 Berlin

Tel:       +49 176 2697 8962
Fax:      +49 30 577 008 299
Email:    gi-oh.kim@xxxxxxxxxxxxxxxx
URL:      https://www.profitbricks.de

Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B
Geschäftsführer: Achim Weiss, Matthias Steinberg, Christoph Steffens




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux