thank you for clearing things up.
>Which means that the test case is actually invalid; you either would
need drop O_DIRECT or modify the buffer
>after write() to arrive with a valid example.
ok, but what about running virtual machines in O_DIRECT mode on top of
mdraid then ?
https://forum.proxmox.com/threads/zfs-on-debian-or-mdadm-softraid-stability-and-reliability-of-zfs.116871/post-505697
i have not seen any report of broken/inconsistent mdraid caused by
virtual machines, so is this just a "theoretical" issue ?
i'm curious why we can use zfs software raid with virtual machines but
not md software raid. shouldn't that have the same problem (
https://www.phoronix.com/news/OpenZFS-Direct-IO ) , at least from now on ?
regards
Roland
Am 10.10.24 um 08:53 schrieb Hannes Reinecke:
On 10/9/24 23:38, Reindl Harald wrote:
Am 09.10.24 um 22:08 schrieb Roland:
as proxmox hypervisor does not offer mdadm software raid at
installation
time because of this bugticket
"MD RAID or DRBD can be broken from userspace when using O_DIRECT"
https://bugzilla.kernel.org/show_bug.cgi?id=99171
ps:
also see "qemu cache=none should not be used with mdadm"
https://bugzilla.proxmox.com/show_bug.cgi?id=5235
that all sounds like terrible nosense
if "Yes. O_DIRECT is really fundamentally broken. There's just no way
to fix it sanely. Except by teaching people not to use it, and making
the normal paths fast enough" it has to go away
it's not acceptable that userspace can break the integrity of the
underlying RAID - period
Take deep breath everyone.
Nothing has happened, nothing has been broken.
All systems continue to operate as normal.
If you look closely at the mentioned bug, you'll find that it does
modify the buffer at random times, in particular while it's being
written to disk.
Now, the boilerplate text for O_DIRECT says: the application is in
control of the data, and the data will be written without any caching.
Applying that to our testcase it means that the application _can_ modify
the data, even if it's in the process of being written to disk (zero
copy and all that).
We do guarantee that data is consistent once I/O is completed (here:
once 'write' returns), but we do not (and, in fact, cannot) guarantee
that data is consistent while write() is running.
Which means that the test case is actually invalid; you either would
need drop O_DIRECT or modify the buffer after write() to arrive with
a valid example.
That doesn't mean that I don't agree with the comments about O_DIRECT.
Cheers,
Hannes