Re: RAID1 sometimes have different data on the slave devices

NeilBrown <neilb@xxxxxxxx> · Thu, 16 Aug 2018 15:42:46 +1000

On Wed, Aug 15 2018, Jinpu Wang wrote:

> Jack Wang <jack.wang.usish@xxxxxxxxx> 于2018年8月14日周二 下午12:43写道：
>>
>> NeilBrown <neilb@xxxxxxxx> 于2018年8月14日周二 上午10:53写道：
>> >
>> > On Tue, Aug 14 2018, Jinpu Wang wrote:
>> >
>> > > NeilBrown <neilb@xxxxxxxx> 于2018年8月14日周二 上午1:31写道：
>> > >>
>> > >> On Mon, Aug 13 2018, David C. Rankin wrote:
>> > >>
>> > >> > On 08/11/2018 02:06 AM, NeilBrown wrote:
>> > >> >> It might be expected behaviour with async direct IO.
>> > >> >> Two threads writing with O_DIRECT io to the same address could result in
>> > >> >> different data on the two devices.  This doesn't seem to me to be a
>> > >> >> credible use-case though.  Why would you ever want to do that in
>> > >> >> practice?
>> > >> >>
>> > >> >> NeilBrown
>> > >> >
>> > >> >   My only thought is while the credible case may be weak, if it is something
>> > >> > that can be protected against with a few conditionals to prevent the different
>> > >> > data on the slaves diverging -- then it's worth a couple of conditions to
>> > >> > prevent the nut that know just enough about dd from confusing things....
>> > >>
>> > >> Yes, it can be protected against - the code is already written.
>> > >> If you have a 2-drive raid1 and want it to be safe against this attack,
>> > >> simply:
>> > >>
>> > >>   mdadm /dev/md127 --grow --level=raid5
>> > >>
>> > >> This will add the required synchronization between writes so that
>> > >> multiple writes to the one block are linearized.  There will be a
>> > >> performance impact.
>> > >>
>> > >> NeilBrown
>> > > Thanks for your comments, Neil.
>> > > Convert to raid5 with 2 drives will not only  cause perrormance drop,
>> > > will also disable the redundancy.
>> > > It's clearly a no go.
>> >
>> > I don't understand why you think it would disable the redundancy, there
>> > are still two copies of every block.  Both RAID1 and RAID5 can survive a
>> > single device failure.
>> I thought RAID5 requirs at least 3 drive with parity, clearly, I was
>> wrong. Sorry.
>>
>> I'm testing the script with raid5, if works as expected.
> I did test on raid5 with 2 drives, indeed, there's no mismatch found.
> But instead
> I triggered some hung task below:
> kernel is from default debian 9, also tried 4.17.0-0.bpo.1-amd64, it
> fails the same.
>
> [64259.850401] md/raid:md127: raid level 5 active with 2 out of 2
> devices, algorithm 2
> [64259.850402] RAID conf printout:
> [64259.850404]  --- level:5 rd:2 wd:2
> [64259.850405]  disk 0, o:1, dev:ram0
> [64259.850407]  disk 1, o:1, dev:ram1
> [64259.850425] md/raid456: discard support disabled due to uncertainty.
> [64259.850427] Set raid456.devices_handle_discard_safely=Y to override.
> [64259.850470] md127: detected capacity change from 0 to 1121976320
> [64259.850513] md: md127 switched to read-write mode.
> [64259.850668] md: resync of RAID array md127
> [64259.850670] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> [64259.850681] md: using maximum available idle IO bandwidth (but not
> more than 200000 KB/sec) for resync.
> [64259.850713] md: using 128k window, over a total of 1095680k.
> [64267.032621] md: md127: resync done.
> [64267.036318] RAID conf printout:
> [64267.036321]  --- level:5 rd:2 wd:2
> [64267.036323]  disk 0, o:1, dev:ram0
> [64267.036325]  disk 1, o:1, dev:ram1
> [64270.122784] EXT4-fs (md127): mounted filesystem with ordered data
> mode. Opts: (null)
> [64404.464954] INFO: task fio:5136 blocked for more than 120 seconds.
> [64404.465035]       Not tainted 4.9.0-7-amd64 #1 Debian 4.9.110-1
> [64404.465088] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [64404.465156] fio             D    0  5136   5134 0x00000000
> [64404.465163]  ffff88a7e2457800 ffff88a7c860c000 ffff88a8192f5040
> ffff88a836718980
> [64404.465169]  ffff88a7c5bb8000 ffffad18016c3bd0 ffffffff8780fe79
> ffff88a77ca18100
> [64404.465174]  0000000000000001 ffff88a836718980 0000000000001000
> ffff88a8192f5040
> [64404.465180] Call Trace:
> [64404.465191]  [<ffffffff8780fe79>] ? __schedule+0x239/0x6f0
> [64404.465197]  [<ffffffff87810362>] ? schedule+0x32/0x80
> [64404.465202]  [<ffffffff87813319>] ? rwsem_down_write_failed+0x1f9/0x360
> [64404.465208]  [<ffffffff8753f033>] ? call_rwsem_down_write_failed+0x13/0x20
> [64404.465213]  [<ffffffff878125c9>] ? down_write+0x29/0x40
> [64404.465306]  [<ffffffffc068b1e0>] ? ext4_file_write_iter+0x50/0x370 [ext4]

Looks like an ext4 problem, or possibly and aio problem.
No evidence that it is RAID related.
Presumably some other thread is holding the semaphore.  Finding that
thread might help.

NeilBrown

> [64404.465311]  [<ffffffff87814ce4>] ? __switch_to_asm+0x34/0x70
> [64404.465315]  [<ffffffff87814ce4>] ? __switch_to_asm+0x34/0x70
> [64404.465318]  [<ffffffff87814cf0>] ? __switch_to_asm+0x40/0x70
> [64404.465322]  [<ffffffff87814ce4>] ? __switch_to_asm+0x34/0x70
> [64404.465325]  [<ffffffff87814cf0>] ? __switch_to_asm+0x40/0x70
> [64404.465328]  [<ffffffff87814ce4>] ? __switch_to_asm+0x34/0x70
> [64404.465334]  [<ffffffff87457f9b>] ? aio_write+0xfb/0x150
> [64404.465338]  [<ffffffff87457547>] ? aio_read_events+0x237/0x370
> [64404.465343]  [<ffffffff873e597c>] ? kmem_cache_alloc+0x11c/0x530
> [64404.465347]  [<ffffffff872e8e97>] ? hrtimer_try_to_cancel+0x27/0x110
> [64404.465352]  [<ffffffff87458fc9>] ? do_io_submit+0x2b9/0x620
> [64404.465357]  [<ffffffff87203b7d>] ? do_syscall_64+0x8d/0xf0
> [64404.465361]  [<ffffffff87814bce>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6
> [64404.465383] INFO: task fio:5137 blocked for more than 120 seconds.
> [64404.465440]       Not tainted 4.9.0-7-amd64 #1 Debian 4.9.110-1
> [64404.465492] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
Attachment:
signature.asc

Description: PGP signature