On Wed, Sep 19, 2018 at 5:09 PM Gi-Oh Kim <gi-oh.kim@xxxxxxxxxxxxxxxx> wrote: > > Hi, > > I found a weird behavior of re-adding a device. > I think it is a kernel bug. > I would appreciate it if somebody can confirm if it is a bug or feature. > > I tested re-adding a device as following. > 1. create md with ram0 and ram1 > 2. add ram2 > 3. grow raid-device number to 3 > 4. remove ram2 > 5. grow raid-device number to 2 > 6. add ram2 > 7. ram0 become faulty and ram2 become active > 8. stop md > 9. assemble md with ram0 and ram1 => fail because ram0 is faulty Hi, I checked the kernel function raid1_spare_active() in raid1.c and found out ram0 is set as faulty on purpose. If ram0 is set as fauly to replace it with ram2, i think it should be successful to assemble ram1 and ram2. But "mdadm -A /dev/md111 /dev/ram1 /dev/ram2" creates md111 with only ram2. I do not understand why it is necessary to set ram0 faulty. How can I re-add ram2 device as the spare device without setting ram0 faulty? > AND following is the test result. > > gohkim@ws00837:~/work$ uname -a > Linux ws00837 4.13.0-16-generic #19-Ubuntu SMP Wed Oct 11 18:35:14 UTC > 2017 x86_64 x86_64 x86_64 GNU/Linux > gohkim@ws00837:~/work$ mdadm --version > mdadm - v4.0 - 2017-01-09 > gohkim@ws00837:~/work$ sudo bash a.sh > mdadm: array /dev/md111 started. > 1+0 records in > 1+0 records out > 512 bytes copied, 0,000224124 s, 2,3 MB/s > 1+0 records in > 1+0 records out > 512 bytes copied, 0,00013107 s, 3,9 MB/s > mdadm: added /dev/ram2 > raid_disks for /dev/md111 set to 3 > Personalities : [raid1] > md111 : active raid1 ram2[2] ram1[1] ram0[0] > 65408 blocks super 1.2 [3/3] [UUU] > bitmap: 1/1 pages [4KB], 65536KB chunk > > unused devices: <none> > mdadm: set /dev/ram2 faulty in /dev/md111 > mdadm: hot removed /dev/ram2 from /dev/md111 > raid_disks for /dev/md111 set to 2 > Personalities : [raid1] > md111 : active raid1 ram1[1] ram0[0] > 65408 blocks super 1.2 [2/2] [UU] > bitmap: 1/1 pages [4KB], 65536KB chunk > > unused devices: <none> > [793537.684995] md: md111 stopped. > [793541.015580] md/raid1:md111: not clean -- starting background reconstruction > [793541.015581] md/raid1:md111: active with 2 out of 2 mirrors > [793541.015607] md111: detected capacity change from 0 to 66977792 > [793541.015688] md: resync of RAID array md111 > [793541.103960] md: md111: resync done. > [793541.133570] md: recovery of RAID array md111 > [793541.151415] md: md111: recovery done. > [793541.153893] md/raid1:md111: Disk failure on ram2, disabling device. > md/raid1:md111: Operation continuing on 2 devices. > mdadm: re-added /dev/ram2 > gohkim@ws00837:~/work$ cat /proc/mdstat > Personalities : [raid1] > md111 : active raid1 ram2[2] ram1[1] ram0[0](F) > 65408 blocks super 1.2 [2/2] [UU] > bitmap: 1/1 pages [4KB], 65536KB chunk > > unused devices: <none> > gohkim@ws00837:~/work$ cat /proc/mdstat > Personalities : [raid1] > md111 : active raid1 ram2[2] ram1[1] ram0[0](F) > 65408 blocks super 1.2 [2/2] [UU] > bitmap: 0/1 pages [0KB], 65536KB chunk > > unused devices: <none> > gohkim@ws00837:~/work$ sudo mdadm -S /dev/md111 > mdadm: stopped /dev/md111 > gohkim@ws00837:~/work$ sudo mdadm -A /dev/md111 /dev/ram2 /dev/ram1 > mdadm: ignoring /dev/ram1 as it reports /dev/ram2 as failed > mdadm: device 4 in /dev/md111 has wrong state in superblock, but > /dev/ram2 seems ok > mdadm: /dev/md111 assembled from 0 drives and 1 spare - not enough to > start the array. > gohkim@ws00837:~/work$ cat /proc/mdstat > Personalities : [raid1] > unused devices: <none> > gohkim@ws00837:~/work$ sudo mdadm -A /dev/md111 /dev/ram1 /dev/ram2 > mdadm: device 4 in /dev/md111 has wrong state in superblock, but > /dev/ram2 seems ok > mdadm: /dev/md111 has been started with 1 drive (out of 2) and 1 spare. > gohkim@ws00837:~/work$ cat /proc/mdstat > Personalities : [raid1] > md111 : active raid1 ram1[1] ram2[2] > 65408 blocks super 1.2 [2/2] [_U] > bitmap: 0/1 pages [0KB], 65536KB chunk > > unused devices: <none> > gohkim@ws00837:~/work$ sudo mdadm -D /dev/md111 > /dev/md111: > Version : 1.2 > Creation Time : Wed Sep 19 16:13:02 2018 > Raid Level : raid1 > Array Size : 65408 (63.88 MiB 66.98 MB) > Used Dev Size : 65408 (63.88 MiB 66.98 MB) > Raid Devices : 2 > Total Devices : 2 > Persistence : Superblock is persistent > > Intent Bitmap : Internal > > Update Time : Wed Sep 19 16:21:37 2018 > State : clean, degraded > Active Devices : 2 > Working Devices : 2 > Failed Devices : 0 > Spare Devices : 0 > > Name : ws00837:111 (local to host ws00837) > UUID : 0dc14c32:21382069:c1bcffe5:30720a7f > Events : 53 > > Number Major Minor RaidDevice State > - 0 0 0 removed > 1 1 1 1 active sync /dev/ram1 > > 2 1 2 2 active sync /dev/ram2 > gohkim@ws00837:~/work$ sudo mdadm -S /dev/md111 > [sudo] password for gohkim: > mdadm: stopped /dev/md111 > gohkim@ws00837:~/work$ sudo mdadm -A /dev/md111 /dev/ram0 /dev/ram1 > mdadm: /dev/md111 has been started with 1 drive (out of 2). > gohkim@ws00837:~/work$ sudo mdadm -D /dev/md111 > /dev/md111: > Version : 1.2 > Creation Time : Wed Sep 19 16:13:02 2018 > Raid Level : raid1 > Array Size : 65408 (63.88 MiB 66.98 MB) > Used Dev Size : 65408 (63.88 MiB 66.98 MB) > Raid Devices : 2 > Total Devices : 1 > Persistence : Superblock is persistent > > Intent Bitmap : Internal > > Update Time : Wed Sep 19 16:21:37 2018 > State : clean, degraded > Active Devices : 1 > Working Devices : 1 > Failed Devices : 0 > Spare Devices : 0 > > Name : ws00837:111 (local to host ws00837) > UUID : 0dc14c32:21382069:c1bcffe5:30720a7f > Events : 53 > > Number Major Minor RaidDevice State > - 0 0 0 removed > 1 1 1 1 active sync /dev/ram1 > -- GIOH KIM Linux Kernel Entwickler ProfitBricks GmbH Greifswalder Str. 207 D - 10405 Berlin Tel: +49 176 2697 8962 Fax: +49 30 577 008 299 Email: gi-oh.kim@xxxxxxxxxxxxxxxx URL: https://www.profitbricks.de Sitz der Gesellschaft: Berlin Registergericht: Amtsgericht Charlottenburg, HRB 125506 B Geschäftsführer: Achim Weiss, Matthias Steinberg, Christoph Steffens