Re: RAID1 sometimes have different data on the slave devices

Jinpu Wang <jinpuwang@xxxxxxxxx> · Wed, 15 Aug 2018 09:57:22 +0200



Jack Wang <jack.wang.usish@xxxxxxxxx> 于2018年8月14日周二 下午12:43写道：
>
> NeilBrown <neilb@xxxxxxxx> 于2018年8月14日周二 上午10:53写道：
> >
> > On Tue, Aug 14 2018, Jinpu Wang wrote:
> >
> > > NeilBrown <neilb@xxxxxxxx> 于2018年8月14日周二 上午1:31写道：
> > >>
> > >> On Mon, Aug 13 2018, David C. Rankin wrote:
> > >>
> > >> > On 08/11/2018 02:06 AM, NeilBrown wrote:
> > >> >> It might be expected behaviour with async direct IO.
> > >> >> Two threads writing with O_DIRECT io to the same address could result in
> > >> >> different data on the two devices.  This doesn't seem to me to be a
> > >> >> credible use-case though.  Why would you ever want to do that in
> > >> >> practice?
> > >> >>
> > >> >> NeilBrown
> > >> >
> > >> >   My only thought is while the credible case may be weak, if it is something
> > >> > that can be protected against with a few conditionals to prevent the different
> > >> > data on the slaves diverging -- then it's worth a couple of conditions to
> > >> > prevent the nut that know just enough about dd from confusing things....
> > >>
> > >> Yes, it can be protected against - the code is already written.
> > >> If you have a 2-drive raid1 and want it to be safe against this attack,
> > >> simply:
> > >>
> > >>   mdadm /dev/md127 --grow --level=raid5
> > >>
> > >> This will add the required synchronization between writes so that
> > >> multiple writes to the one block are linearized.  There will be a
> > >> performance impact.
> > >>
> > >> NeilBrown
> > > Thanks for your comments, Neil.
> > > Convert to raid5 with 2 drives will not only  cause perrormance drop,
> > > will also disable the redundancy.
> > > It's clearly a no go.
> >
> > I don't understand why you think it would disable the redundancy, there
> > are still two copies of every block.  Both RAID1 and RAID5 can survive a
> > single device failure.
> I thought RAID5 requirs at least 3 drive with parity, clearly, I was
> wrong. Sorry.
>
> I'm testing the script with raid5, if works as expected.
I did test on raid5 with 2 drives, indeed, there's no mismatch found.
But instead
I triggered some hung task below:
kernel is from default debian 9, also tried 4.17.0-0.bpo.1-amd64, it
fails the same.

[64259.850401] md/raid:md127: raid level 5 active with 2 out of 2
devices, algorithm 2
[64259.850402] RAID conf printout:
[64259.850404]  --- level:5 rd:2 wd:2
[64259.850405]  disk 0, o:1, dev:ram0
[64259.850407]  disk 1, o:1, dev:ram1
[64259.850425] md/raid456: discard support disabled due to uncertainty.
[64259.850427] Set raid456.devices_handle_discard_safely=Y to override.
[64259.850470] md127: detected capacity change from 0 to 1121976320
[64259.850513] md: md127 switched to read-write mode.
[64259.850668] md: resync of RAID array md127
[64259.850670] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[64259.850681] md: using maximum available idle IO bandwidth (but not
more than 200000 KB/sec) for resync.
[64259.850713] md: using 128k window, over a total of 1095680k.
[64267.032621] md: md127: resync done.
[64267.036318] RAID conf printout:
[64267.036321]  --- level:5 rd:2 wd:2
[64267.036323]  disk 0, o:1, dev:ram0
[64267.036325]  disk 1, o:1, dev:ram1
[64270.122784] EXT4-fs (md127): mounted filesystem with ordered data
mode. Opts: (null)
[64404.464954] INFO: task fio:5136 blocked for more than 120 seconds.
[64404.465035]       Not tainted 4.9.0-7-amd64 #1 Debian 4.9.110-1
[64404.465088] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[64404.465156] fio             D    0  5136   5134 0x00000000
[64404.465163]  ffff88a7e2457800 ffff88a7c860c000 ffff88a8192f5040
ffff88a836718980
[64404.465169]  ffff88a7c5bb8000 ffffad18016c3bd0 ffffffff8780fe79
ffff88a77ca18100
[64404.465174]  0000000000000001 ffff88a836718980 0000000000001000
ffff88a8192f5040
[64404.465180] Call Trace:
[64404.465191]  [<ffffffff8780fe79>] ? __schedule+0x239/0x6f0
[64404.465197]  [<ffffffff87810362>] ? schedule+0x32/0x80
[64404.465202]  [<ffffffff87813319>] ? rwsem_down_write_failed+0x1f9/0x360
[64404.465208]  [<ffffffff8753f033>] ? call_rwsem_down_write_failed+0x13/0x20
[64404.465213]  [<ffffffff878125c9>] ? down_write+0x29/0x40
[64404.465306]  [<ffffffffc068b1e0>] ? ext4_file_write_iter+0x50/0x370 [ext4]
[64404.465311]  [<ffffffff87814ce4>] ? __switch_to_asm+0x34/0x70
[64404.465315]  [<ffffffff87814ce4>] ? __switch_to_asm+0x34/0x70
[64404.465318]  [<ffffffff87814cf0>] ? __switch_to_asm+0x40/0x70
[64404.465322]  [<ffffffff87814ce4>] ? __switch_to_asm+0x34/0x70
[64404.465325]  [<ffffffff87814cf0>] ? __switch_to_asm+0x40/0x70
[64404.465328]  [<ffffffff87814ce4>] ? __switch_to_asm+0x34/0x70
[64404.465334]  [<ffffffff87457f9b>] ? aio_write+0xfb/0x150
[64404.465338]  [<ffffffff87457547>] ? aio_read_events+0x237/0x370
[64404.465343]  [<ffffffff873e597c>] ? kmem_cache_alloc+0x11c/0x530
[64404.465347]  [<ffffffff872e8e97>] ? hrtimer_try_to_cancel+0x27/0x110
[64404.465352]  [<ffffffff87458fc9>] ? do_io_submit+0x2b9/0x620
[64404.465357]  [<ffffffff87203b7d>] ? do_syscall_64+0x8d/0xf0
[64404.465361]  [<ffffffff87814bce>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6
[64404.465383] INFO: task fio:5137 blocked for more than 120 seconds.
[64404.465440]       Not tainted 4.9.0-7-amd64 #1 Debian 4.9.110-1
[64404.465492] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.