On 28/11/2023 13:56, Christoph Hellwig wrote:
On Tue, Nov 28, 2023 at 08:56:37AM +0000, John Garry wrote:
Are you suggesting some sort of hybrid between the atomic write series you
had a few years ago and this solution?
Very roughly, yes.
To me that would be continuing with the following:
- per-IO RWF_ATOMIC (and not O_ATOMIC semantics of nothing is written until
some data sync)
Yes.
- writes must be a power-of-two and at a naturally-aligned offset
Where offset is offset in the file?
ok, fine, it would not be required for XFS with CoW. Some concerns still:
a. device atomic write boundary, if any
b. other FSes which do not have CoW support. ext4 is already being used
for "atomic writes" in the field - see dubious amazon torn-write prevention.
About b., we could add the pow-of-2 and file offset alignment
requirement for other FSes, but then need to add some method to
advertise that restriction.
It would not require it. You
probably want to do it for optimal performance, but requiring it
feeels rather limited.
- relying on atomic write HW support always
And I think that's where we have different opinions.
I'm just trying to understand your idea and that is not necessarily my
final opinion.
I think the hw
offload is a nice optimization and we should use it wherever we can.
Sure, but to me it is a concern that we have 2x paths to make robust a.
offload via hw, which may involve CoW b. no HW support, i.e. CoW always
And for no HW support, if we don't follow the O_ATOMIC model of
committing nothing until a SYNC is issued, would we allocate, write, and
later free a new extent for each write, right?
But building the entire userspace API around it feels like a mistake.
ok, but FWIW it works for the usecases which we know.
BTW, we also have rtvol support which does not use forcealign as it already
can guarantee alignment, but still does rely on the same principle of
requiring alignment - would you want CoW support there also?
Upstream doesn't have out of place write support for the RT subvolume
yet. But Darrick has a series for it and we're actively working on
upstreaming it.
Yeah, I thought that I heard this.
Thanks,
John