NeilBrown <neilb@xxxxxxxx> 于2018年8月16日周四 上午7:42写道: > > On Wed, Aug 15 2018, Jinpu Wang wrote: > > > Jack Wang <jack.wang.usish@xxxxxxxxx> 于2018年8月14日周二 下午12:43写道: > >> > >> NeilBrown <neilb@xxxxxxxx> 于2018年8月14日周二 上午10:53写道: > >> > > >> > On Tue, Aug 14 2018, Jinpu Wang wrote: > >> > > >> > > NeilBrown <neilb@xxxxxxxx> 于2018年8月14日周二 上午1:31写道: > >> > >> > >> > >> On Mon, Aug 13 2018, David C. Rankin wrote: > >> > >> > >> > >> > On 08/11/2018 02:06 AM, NeilBrown wrote: > >> > >> >> It might be expected behaviour with async direct IO. > >> > >> >> Two threads writing with O_DIRECT io to the same address could result in > >> > >> >> different data on the two devices. This doesn't seem to me to be a > >> > >> >> credible use-case though. Why would you ever want to do that in > >> > >> >> practice? > >> > >> >> > >> > >> >> NeilBrown > >> > >> > > >> > >> > My only thought is while the credible case may be weak, if it is something > >> > >> > that can be protected against with a few conditionals to prevent the different > >> > >> > data on the slaves diverging -- then it's worth a couple of conditions to > >> > >> > prevent the nut that know just enough about dd from confusing things.... > >> > >> > >> > >> Yes, it can be protected against - the code is already written. > >> > >> If you have a 2-drive raid1 and want it to be safe against this attack, > >> > >> simply: > >> > >> > >> > >> mdadm /dev/md127 --grow --level=raid5 > >> > >> > >> > >> This will add the required synchronization between writes so that > >> > >> multiple writes to the one block are linearized. There will be a > >> > >> performance impact. > >> > >> > >> > >> NeilBrown > >> > > Thanks for your comments, Neil. > >> > > Convert to raid5 with 2 drives will not only cause perrormance drop, > >> > > will also disable the redundancy. > >> > > It's clearly a no go. > >> > > >> > I don't understand why you think it would disable the redundancy, there > >> > are still two copies of every block. Both RAID1 and RAID5 can survive a > >> > single device failure. > >> I thought RAID5 requirs at least 3 drive with parity, clearly, I was > >> wrong. Sorry. > >> > >> I'm testing the script with raid5, if works as expected. > > I did test on raid5 with 2 drives, indeed, there's no mismatch found. > > But instead > > I triggered some hung task below: > > kernel is from default debian 9, also tried 4.17.0-0.bpo.1-amd64, it > > fails the same. > > > > [64259.850401] md/raid:md127: raid level 5 active with 2 out of 2 > > devices, algorithm 2 > > [64259.850402] RAID conf printout: > > [64259.850404] --- level:5 rd:2 wd:2 > > [64259.850405] disk 0, o:1, dev:ram0 > > [64259.850407] disk 1, o:1, dev:ram1 > > [64259.850425] md/raid456: discard support disabled due to uncertainty. > > [64259.850427] Set raid456.devices_handle_discard_safely=Y to override. > > [64259.850470] md127: detected capacity change from 0 to 1121976320 > > [64259.850513] md: md127 switched to read-write mode. > > [64259.850668] md: resync of RAID array md127 > > [64259.850670] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. > > [64259.850681] md: using maximum available idle IO bandwidth (but not > > more than 200000 KB/sec) for resync. > > [64259.850713] md: using 128k window, over a total of 1095680k. > > [64267.032621] md: md127: resync done. > > [64267.036318] RAID conf printout: > > [64267.036321] --- level:5 rd:2 wd:2 > > [64267.036323] disk 0, o:1, dev:ram0 > > [64267.036325] disk 1, o:1, dev:ram1 > > [64270.122784] EXT4-fs (md127): mounted filesystem with ordered data > > mode. Opts: (null) > > [64404.464954] INFO: task fio:5136 blocked for more than 120 seconds. > > [64404.465035] Not tainted 4.9.0-7-amd64 #1 Debian 4.9.110-1 > > [64404.465088] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > disables this message. > > [64404.465156] fio D 0 5136 5134 0x00000000 > > [64404.465163] ffff88a7e2457800 ffff88a7c860c000 ffff88a8192f5040 > > ffff88a836718980 > > [64404.465169] ffff88a7c5bb8000 ffffad18016c3bd0 ffffffff8780fe79 > > ffff88a77ca18100 > > [64404.465174] 0000000000000001 ffff88a836718980 0000000000001000 > > ffff88a8192f5040 > > [64404.465180] Call Trace: > > [64404.465191] [<ffffffff8780fe79>] ? __schedule+0x239/0x6f0 > > [64404.465197] [<ffffffff87810362>] ? schedule+0x32/0x80 > > [64404.465202] [<ffffffff87813319>] ? rwsem_down_write_failed+0x1f9/0x360 > > [64404.465208] [<ffffffff8753f033>] ? call_rwsem_down_write_failed+0x13/0x20 > > [64404.465213] [<ffffffff878125c9>] ? down_write+0x29/0x40 > > [64404.465306] [<ffffffffc068b1e0>] ? ext4_file_write_iter+0x50/0x370 [ext4] > > Looks like an ext4 problem, or possibly and aio problem. > No evidence that it is RAID related. > Presumably some other thread is holding the semaphore. Finding that > thread might help. > > NeilBrown Sorry for late reply. You're right, likely it's ext4 problem. Tested directly on raid5 without any problem. Thanks, Jack