Re: [PATCH 0/3] [RESEND]dm-raid1: several fixes about writing on out of sync mirror device

Heinz Mauelshagen <heinzm@xxxxxxxxxx> · Thu, 16 Apr 2015 11:58:06 +0200

Lidong,

tests need to happen under heavy load, i.e. worst
case scenario failures.

E.g. an fs is mounted and being updated whilst
you're tacking offline/bringing back mirror legs
to cause them to be get resynchronized.

Heinz

On 04/16/2015 05:43 AM, Lidong Zhong wrote:
Hi List/Heinz,

These three patches are done based on last patch series that replied on April 8.
The following is the test I did about this feature. My test environment:
linux-klqg:~ # dmsetup ls --tree
vg-lv (253:4)
  ├─vg-lv_mimage_2 (253:3)
  │  └─ (8:48)
  ├─vg-lv_mimage_1 (253:2)
  │  └─ (8:32)
  ├─vg-lv_mimage_0 (253:1)
  │  └─ (8:16)
  └─vg-lv_mlog (253:0)
	└─ (8:64)
nux-klqg:~ # dmsetup status
vg-lv_mimage_2: 0 614400 linear
vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 600/600 1 AAA 3 disk 253:0 A
vg-lv_mimage_1: 0 614400 linear
vg-lv_mimage_0: 0 614400 linear
vg-lv_mlog: 0 8192 linear
linux-klqg:~ # dmsetup table
vg-lv_mimage_2: 0 614400 linear 8:48 2048
vg-lv: 0 614400 mirror disk 2 253:0 1024 3 253:1 0 253:2 0 253:3 0 2 handle_errors keep_log
vg-lv_mimage_1: 0 614400 linear 8:32 2048
vg-lv_mimage_0: 0 614400 linear 8:16 2048
vg-lv_mlog: 0 8192 linear 8:64 2048

1\, single data device failure
After make one of the data legs failed, writing data to the first three regions.
linux-klqg:~ # echo "a" |dd of=/dev/vg/lv bs=1K count=1 seek=0
0+1 records in
0+1 records out
2 bytes (2 B) copied, 0.0103211 s, 0.2 kB/s
linux-klqg:~ # echo "b" |dd of=/dev/vg/lv bs=1K count=1 seek=512
0+1 records in
0+1 records out
2 bytes (2 B) copied, 0.00428962 s, 0.5 kB/s
linux-klqg:~ # echo "c" |dd of=/dev/vg/lv bs=1K count=1 seek=1024
0+1 records in
0+1 records out
2 bytes (2 B) copied, 0.00282482 s, 0.7 kB/s
linux-klqg:~ # dmsetup status
vg-lv_mimage_2: 0 614400 linear
vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 597/600 1 ADA 3 disk 253:0 A
vg-lv_mimage_1: 0 614400 linear
vg-lv_mimage_0: 0 614400 linear
vg-lv_mlog: 0 8192 linear

Now the failed device comes back, its major/minor number may changes, replace the table as needed.
(The devices I tested on are iscsi devices and the minor number changed after each attach/detach)
Then start the recovery
linux-klqg:~ # dmsetup suspend vg-lv
linux-klqg:~ # dmsetup resume vg-lv
linux-klqg:~ # dmsetup status
vg-lv_mimage_2: 0 614400 linear
vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 600/600 1 AAA 3 disk 253:0 A
vg-lv_mimage_1: 0 614400 linear
vg-lv_mimage_0: 0 614400 linear
vg-lv_mlog: 0 8192 linear

We can see that all the regions are in sync now.

2\, two or more data device failure
After detaching the first device(mine is /dev/sdb), write data to the first and second region
linux-klqg:~ # echo "1111111" | dd of=/dev/vg/lv bs=1K count=1 seek=0
0+1 records in
0+1 records out
8 bytes (8 B) copied, 0.00209451 s, 3.8 kB/s
linux-klqg:~ # echo "222222" | dd of=/dev/vg/lv bs=1K count=1 seek=512
0+1 records in
0+1 records out
7 bytes (7 B) copied, 0.00259999 s, 2.7 kB/s
linux-klqg:~ # dmsetup status
vg-lv_mimage_2: 0 614400 linear
vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 598/600 1 ADA 3 disk 253:0 A
vg-lv_mimage_1: 0 614400 linear
vg-lv_mimage_0: 0 614400 linear
vg-lv_mlog: 0 8192 linear

Now the first and second region are marked as no sync. Then detach the second device
(mine is /dev/sdd) and write data to the third and fourth region
linux-klqg:~ # echo "333333" | dd of=/dev/vg/lv bs=1K count=1 seek=1024
0+1 records in
0+1 records out
7 bytes (7 B) copied, 0.00178031 s, 3.9 kB/s
linux-klqg:~ # echo "444444" | dd of=/dev/vg/lv bs=1K count=1 seek=1536
0+1 records in
0+1 records out
7 bytes (7 B) copied, 0.00256491 s, 2.7 kB/s
linux-klqg:~ # dmsetup status
vg-lv_mimage_2: 0 614400 linear
vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 596/600 1 DDA 3 disk 253:0 A
vg-lv_mimage_1: 0 614400 linear
vg-lv_mimage_0: 0 614400 linear
vg-lv_mlog: 0 8192 linear

Now there are 4 regions are marked as no sync. Then the first failed device comes back, we try to
do the recovery.
linux-klqg:~ # dmsetup suspend vg-lv
linux-klqg:~ # dmsetup resume vg-lv
linux-klqg:~ #
linux-klqg:~ #
linux-klqg:~ # dmsetup status
vg-lv_mimage_2: 0 614400 linear
vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 596/600 1 DDA 3 disk 253:0 A
vg-lv_mimage_1: 0 614400 linear
vg-lv_mimage_0: 0 614400 linear
vg-lv_mlog: 0 8192 linear

And it shows there are still 4 regions are marked as no resync, because there is still
a missing device. And we keep writing to the fifth region
linux-klqg:~ # echo "5555555" | dd of=/dev/vg/lv bs=1K count=1 seek=2048
0+1 records in
0+1 records out
8 bytes (8 B) copied, 0.00213449 s, 3.7 kB/s

And now the second missing device comes back. We try to do the recovery
linux-klqg:~ # dmsetup suspend vg-lv
linux-klqg:~ # dmsetup resume  vg-lv
linux-klqg:~ # dmsetup status
vg-lv_mimage_2: 0 614400 linear
vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 600/600 1 AAA 3 disk 253:0 A
vg-lv_mimage_1: 0 614400 linear
vg-lv_mimage_0: 0 614400 linear
vg-lv_mlog: 0 8192 linear

It shows all the legs are sync now. we read data from each leg and get the
same result.
3\, log device failure
After make the log device failed, we tried to write on this lv
linux-klqg:~ # echo "test" |dd of=/dev/vg/lv bs=1K count=1 seek=0
0+1 records in
0+1 records out
21 bytes (21 B) copied, 0.00470523 s, 4.5 kB/s
linux-klqg:~ # dmsetup status
vg-lv_mimage_2: 0 614400 linear
vg-lv: 0 614400 mirror 3 253:1 253:2 253:3 600/600 1 AAA 3 disk 253:0 D
vg-lv_mimage_1: 0 614400 linear
vg-lv_mimage_0: 0 614400 linear
vg-lv_mlog: 0 8192 linear
And we can see that the log device is marked as failed.
And the bio is not written to the data legs because we can't read new data our of
the leg

Is the test enough? or is there corner case that is not covered in the patch?
Any advice is appreciated.

Regards,
Lidong

Lidong Zhong (3):
   dm-raid1: fix the parameter passed into the kernel
   dm-raid1: remove the error flags in the mirror set when it's in sync
   dm-raid1: change default mirror when it's not in sync

  drivers/md/dm-raid1.c | 38 +++++++++++++++++++++++++-------------
  1 file changed, 25 insertions(+), 13 deletions(-)

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel