[RFC] wrong behavior of re-adding a device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I found a weird behavior of re-adding a device.
I think it is a kernel bug.
I would appreciate it if somebody can confirm if it is a bug or feature.

I tested re-adding a device as following.
1. create md with ram0 and ram1
2. add ram2
3. grow raid-device number to 3
4. remove ram2
5. grow raid-device number to 2
6. add ram2
7. ram0 become faulty and ram2 become active
8. stop md
9. assemble md with ram0 and ram1 => fail because ram0 is faulty

I think ram2 should have been spare device at step 7 but active.
I do not understand why ram0 became faulty at step 7.
Is this a feature or bug?

I can reproduce it on every machines I have (one Ubuntu desktop, three
Debian servers).
So far I have added some printk in kernel and I thought r1conf->mirror
array is not cleared after step 5 (increase raid-device number from 3
to 2) and raid1_spare_active() sets the first device as faulty.

Following is the script I used.

gohkim@ws00837:~/work$ cat a.sh
mdadm --zero-superblock /dev/ram0 /dev/ram1 /dev/ram2
mdadm -C /dev/md111 -e 1.2 -l 1 --bitmap=internal -n 2 /dev/ram0 /dev/ram1
dd if=/dev/md111 of=./mbr bs=512 count=1
dd of=/dev/md111 if=./mbr bs=512 count=1
mdadm --wait /dev/md111

mdadm /dev/md111 --add /dev/ram2
mdadm --wait /dev/md111
mdadm --grow /dev/md111 --raid-devices=3
mdadm --wait /dev/md111
cat /proc/mdstat

mdadm /dev/md111 --fail /dev/ram2
mdadm /dev/md111 --remove /dev/ram2
mdadm --grow /dev/md111 --raid-devices=2
mdadm --wait /dev/md111
cat /proc/mdstat

#dd if=/dev/md111 of=./mbr bs=1M count=1
#dd of=/dev/md111 if=./mbr bs=1M count=1

dmesg | tail

mdadm /dev/md111 --add /dev/ram2

AND following is the test result.

gohkim@ws00837:~/work$ uname -a
Linux ws00837 4.13.0-16-generic #19-Ubuntu SMP Wed Oct 11 18:35:14 UTC
2017 x86_64 x86_64 x86_64 GNU/Linux
gohkim@ws00837:~/work$ mdadm --version
mdadm - v4.0 - 2017-01-09
gohkim@ws00837:~/work$ sudo bash a.sh
mdadm: array /dev/md111 started.
1+0 records in
1+0 records out
512 bytes copied, 0,000224124 s, 2,3 MB/s
1+0 records in
1+0 records out
512 bytes copied, 0,00013107 s, 3,9 MB/s
mdadm: added /dev/ram2
raid_disks for /dev/md111 set to 3
Personalities : [raid1]
md111 : active raid1 ram2[2] ram1[1] ram0[0]
      65408 blocks super 1.2 [3/3] [UUU]
      bitmap: 1/1 pages [4KB], 65536KB chunk

unused devices: <none>
mdadm: set /dev/ram2 faulty in /dev/md111
mdadm: hot removed /dev/ram2 from /dev/md111
raid_disks for /dev/md111 set to 2
Personalities : [raid1]
md111 : active raid1 ram1[1] ram0[0]
      65408 blocks super 1.2 [2/2] [UU]
      bitmap: 1/1 pages [4KB], 65536KB chunk

unused devices: <none>
[793537.684995] md: md111 stopped.
[793541.015580] md/raid1:md111: not clean -- starting background reconstruction
[793541.015581] md/raid1:md111: active with 2 out of 2 mirrors
[793541.015607] md111: detected capacity change from 0 to 66977792
[793541.015688] md: resync of RAID array md111
[793541.103960] md: md111: resync done.
[793541.133570] md: recovery of RAID array md111
[793541.151415] md: md111: recovery done.
[793541.153893] md/raid1:md111: Disk failure on ram2, disabling device.
                md/raid1:md111: Operation continuing on 2 devices.
mdadm: re-added /dev/ram2
gohkim@ws00837:~/work$ cat /proc/mdstat
Personalities : [raid1]
md111 : active raid1 ram2[2] ram1[1] ram0[0](F)
      65408 blocks super 1.2 [2/2] [UU]
      bitmap: 1/1 pages [4KB], 65536KB chunk

unused devices: <none>
gohkim@ws00837:~/work$ cat /proc/mdstat
Personalities : [raid1]
md111 : active raid1 ram2[2] ram1[1] ram0[0](F)
      65408 blocks super 1.2 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

unused devices: <none>
gohkim@ws00837:~/work$ sudo mdadm -S /dev/md111
mdadm: stopped /dev/md111
gohkim@ws00837:~/work$ sudo mdadm -A /dev/md111 /dev/ram2 /dev/ram1
mdadm: ignoring /dev/ram1 as it reports /dev/ram2 as failed
mdadm: device 4 in /dev/md111 has wrong state in superblock, but
/dev/ram2 seems ok
mdadm: /dev/md111 assembled from 0 drives and 1 spare - not enough to
start the array.
gohkim@ws00837:~/work$ cat /proc/mdstat
Personalities : [raid1]
unused devices: <none>
gohkim@ws00837:~/work$ sudo mdadm -A /dev/md111 /dev/ram1 /dev/ram2
mdadm: device 4 in /dev/md111 has wrong state in superblock, but
/dev/ram2 seems ok
mdadm: /dev/md111 has been started with 1 drive (out of 2) and 1 spare.
gohkim@ws00837:~/work$ cat /proc/mdstat
Personalities : [raid1]
md111 : active raid1 ram1[1] ram2[2]
      65408 blocks super 1.2 [2/2] [_U]
      bitmap: 0/1 pages [0KB], 65536KB chunk

unused devices: <none>
gohkim@ws00837:~/work$ sudo mdadm -D /dev/md111
/dev/md111:
        Version : 1.2
  Creation Time : Wed Sep 19 16:13:02 2018
     Raid Level : raid1
     Array Size : 65408 (63.88 MiB 66.98 MB)
  Used Dev Size : 65408 (63.88 MiB 66.98 MB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Wed Sep 19 16:21:37 2018
          State : clean, degraded
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           Name : ws00837:111  (local to host ws00837)
           UUID : 0dc14c32:21382069:c1bcffe5:30720a7f
         Events : 53

    Number   Major   Minor   RaidDevice State
       -       0        0        0      removed
       1       1        1        1      active sync   /dev/ram1

       2       1        2        2      active sync   /dev/ram2
gohkim@ws00837:~/work$ sudo mdadm -S /dev/md111
[sudo] password for gohkim:
mdadm: stopped /dev/md111
gohkim@ws00837:~/work$ sudo mdadm -A /dev/md111 /dev/ram0 /dev/ram1
mdadm: /dev/md111 has been started with 1 drive (out of 2).
gohkim@ws00837:~/work$ sudo mdadm -D /dev/md111
/dev/md111:
        Version : 1.2
  Creation Time : Wed Sep 19 16:13:02 2018
     Raid Level : raid1
     Array Size : 65408 (63.88 MiB 66.98 MB)
  Used Dev Size : 65408 (63.88 MiB 66.98 MB)
   Raid Devices : 2
  Total Devices : 1
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Wed Sep 19 16:21:37 2018
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           Name : ws00837:111  (local to host ws00837)
           UUID : 0dc14c32:21382069:c1bcffe5:30720a7f
         Events : 53

    Number   Major   Minor   RaidDevice State
       -       0        0        0      removed
       1       1        1        1      active sync   /dev/ram1

-- 
GIOH KIM
Linux Kernel Entwickler

ProfitBricks GmbH
Greifswalder Str. 207
D - 10405 Berlin

Tel:       +49 176 2697 8962
Fax:      +49 30 577 008 299
Email:    gi-oh.kim@xxxxxxxxxxxxxxxx
URL:      https://www.profitbricks.de

Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B
Geschäftsführer: Achim Weiss, Matthias Steinberg, Christoph Steffens




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux