Re: [PATCH 17/21] fs: xfs: iomap atomic write support

John Garry <john.g.garry@xxxxxxxxxx> · Mon, 4 Dec 2023 15:19:15 +0000

On 04/12/2023 13:45, Christoph Hellwig wrote:
On Tue, Nov 28, 2023 at 05:42:10PM +0000, John Garry wrote:
ok, fine, it would not be required for XFS with CoW. Some concerns still:
a. device atomic write boundary, if any
b. other FSes which do not have CoW support. ext4 is already being used for
"atomic writes" in the field - see dubious amazon torn-write prevention.

What is the 'dubious amazon torn-write prevention'?

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/storage-twp.html

AFAICS, this is without any kernel changes, so no guarantee of unwanted 
splitting or merging of bios.

Anyway, there will still be !CoW FSes which people want to support.

About b., we could add the pow-of-2 and file offset alignment requirement
for other FSes, but then need to add some method to advertise that
restriction.

We really need a better way to communicate I/O limitations anyway.
Something like XFS_IOC_DIOINFO on steroids.

Sure, but to me it is a concern that we have 2x paths to make robust a.
offload via hw, which may involve CoW b. no HW support, i.e. CoW always

Relying just on the hardware seems very limited, especially as there is
plenty of hardware that won't guarantee anything larger than 4k, and
plenty of NVMe hardware without has some other small limit like 32k
because it doesn't support multiple atomicy mode.

So what would you propose as the next step? Would it to be first achieve 
atomic write support for XFS with HW support + CoW to ensure contiguous 
extents (and without XFS forcealign)?

And for no HW support, if we don't follow the O_ATOMIC model of committing
nothing until a SYNC is issued, would we allocate, write, and later free a
new extent for each write, right?

Yes. Then again if you do data journalling you do that anyway, and as
one little project I'm doing right now shows that data journling is
often the fastest thing we can do for very small writes.

Ignoring FSes, then how is this supposed to work for block devices? We 
just always need HW support, right?

Thanks,
John