On 08/01/2025 01:26, Darrick J. Wong wrote:
I (vaguely) agree ith that.
And only if the file mapping is in the correct state, and the
program is willing to*maintain* them in the correct state to get the
better performance.
I kinda agree with that, but the maintain is a bit hard as general
rule of thumb as file mappings can change behind the applications
back. So building interfaces around the concept that there are
entirely stable mappings seems like a bad idea.
I tend to agree.
As long as it's a general rule that file mappings can change even after
whatever prep work an application tries to do, we're never going to have
an easy time enabling any of these fancy direct-to-storage tricks like
cpu loads and stores to pmem, or this block-untorn writes stuff.
I don't want xfs to grow code to write zeroes to
mapped blocks just so it can then write-untorn to the same blocks.
Agreed.
Any other ideas on how to achieve this then?
There was the proposal to create a single bio covering mixed mappings,
but then we had the issue that all the mappings cannot be atomically
converted. I am not sure if this is really such an issue. I know that
RWF_ATOMIC means all or nothing, but partially converted extents (from
an atomic write) is a sort of grey area, as the original unmapped
extents had nothing in the first place.
So if we want to allow large writes over mixed extents, how to handle?
Note that some time ago we also discussed that we don't want to have a
single bio covering mixed extents as we cannot atomically convert all
unwritten extents to mapped.
Fromhttps://lore.kernel.org/linux-xfs/Z3wbqlfoZjisbe1x@xxxxxxxxxxxxx/ :
"I think we should wire it up as a new FALLOC_FL_WRITE_ZEROES mode,
document very vigorously that it exists to facilitate pure overwrites
(specifically that it returns EOPNOTSUPP for always-cow files), and not
add more ioctls."
If we added this new fallocate mode to set up written mappings, would it
be enough to write in the programming manuals that applications should
use it to prepare a file for block-untorn writes?
Sure, that API extension could be useful in the case that we conclude
that we don't permit atomic writes over mixed mappings.
Perhaps we should
change the errno code to EMEDIUMTYPE for the mixed mappings case.
Alternately, maybe we/should/ let programs open a lease-fd on a file
range, do their untorn writes through the lease fd, and if another
thread does something to break the lease, then the lease fd returns EIO
until you close it.
So do means applications own specific ranges in files for exclusive
atomic writes? Wouldn't that break what we already support today?
Cheers,
John