Re: status of bugzilla #99171 - mdraid broken for O_DIRECT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



thank you for clearing things up.

>Which means that the test case is actually invalid; you either would
need drop O_DIRECT or modify the buffer
>after write() to arrive with a valid example.

ok, but what about running virtual machines in O_DIRECT mode on top of
mdraid then ?

https://forum.proxmox.com/threads/zfs-on-debian-or-mdadm-softraid-stability-and-reliability-of-zfs.116871/post-505697

i have not seen any report of broken/inconsistent mdraid caused by
virtual machines, so is this just a "theoretical" issue ?

i'm curious why we can use zfs software raid with virtual machines but
not md software raid.     shouldn't that have the same problem  (
https://www.phoronix.com/news/OpenZFS-Direct-IO ) , at least from now on ?

regards
Roland


Am 10.10.24 um 08:53 schrieb Hannes Reinecke:
On 10/9/24 23:38, Reindl Harald wrote:

Am 09.10.24 um 22:08 schrieb Roland:
as proxmox hypervisor does not offer mdadm software raid at
installation
time because of this bugticket

"MD RAID or DRBD can be broken from userspace when using O_DIRECT"
https://bugzilla.kernel.org/show_bug.cgi?id=99171

ps:
also see "qemu cache=none should not be used with mdadm"
https://bugzilla.proxmox.com/show_bug.cgi?id=5235
that all sounds like terrible nosense

if "Yes. O_DIRECT is really fundamentally broken. There's just no way
to fix it sanely. Except by teaching people not to use it, and making
the normal paths fast enough" it has to go away

it's not acceptable that userspace can break the integrity of the
underlying RAID - period

Take deep breath everyone.
Nothing has happened, nothing has been broken.
All systems continue to operate as normal.

If you look closely at the mentioned bug, you'll find that it does
modify the buffer at random times, in particular while it's being
written to disk.
Now, the boilerplate text for O_DIRECT says: the application is in
control of the data, and the data will be written without any caching.
Applying that to our testcase it means that the application _can_ modify
the data, even if it's in the process of being written to disk (zero
copy and all that).
We do guarantee that data is consistent once I/O is completed (here:
once 'write' returns), but we do not (and, in fact, cannot) guarantee
that data is consistent while write() is running.

Which means that the test case is actually invalid; you either would
need drop O_DIRECT or modify the buffer after write() to arrive with
a valid example.

That doesn't mean that I don't agree with the comments about O_DIRECT.

Cheers,

Hannes





[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux