Re: [PATCH v2 00/16] block atomic writes

John Garry <john.g.garry@xxxxxxxxxx> · Wed, 13 Dec 2023 16:27:35 +0000

On 13/12/2023 15:44, Christoph Hellwig wrote:
On Wed, Dec 13, 2023 at 09:32:06AM +0000, John Garry wrote:
- How to make API extensible for when we have no HW support? In that case,
    we would prob not have to follow rule of power-of-2 length et al.
    As a possible solution, maybe we can say that atomic writes are
    supported for the file via statx, but not set unit_min and max values,
    and this means that writes need to be just FS block aligned there.
I don't think the power of two length is much of a problem to be
honest, and if we every want to lift it we can still do that easily
by adding a new flag or limit.
ok, but it would be nice to have some idea on what that flag or limit
change would be.
That would require a concrete use case.  The simples thing for a file
system that can or does log I/O it would simply be a flag waving all
the alignment and size requirements.

ok...

I suspect we need an on-disk flag that forces allocations to be
aligned to the atomic write limit, in some ways similar how the
XFS rt flag works.  You'd need to set it on an empty file, and all
allocations after that are guaranteed to be properly aligned.
Hmmm... so how is this different to the XFS forcealign feature?
Maybe not much.  But that's not what it is about - we need a common
API for this and not some XFS internal flag.  So if this is something
we could support in ext4 as well that would be a good step.  And for
btrfs you'd probably want to support something like it in nocow mode
if people care enough, or always support atomics and write out of
place.

The forcealign feature was a bit of case of knowing specifically what to 
do for XFS to enable atomic writes.

But some common method to configure any FS file for atomic writes (if 
supported) would be preferable, right?

For XFS, I thought that your idea was to always CoW new extents for
misaligned extents or writes which spanned multiple extents.
Well, that is useful for two things:

  - atomic writes on hardware that does not support it
  - atomic writes for bufferd I/O
  - supporting other sizes / alignments than the strict power of
    two above.

Sure

Right, so we should limit atomic write queue limits to max_hw_sectors. But
people can still tweak max_sectors, and I am inclined to say that
atomic_write_unit_max et al should be (dynamically) limited to max_sectors
also.
Allowing people to tweak it seems to be asking for trouble.

I tend to agree....

Let's just limit at max_hw_sectors.

have that silly limit.  For NVMe that would require SGL support
(and some driver changes I've been wanting to make for long where
we always use SGLs for transfers larger than a single PRP if supported)
If we could avoid dealing with a virt boundary, then that would be nice.

Are there any patches yet for the change to always use SGLs for transfers
larger than a single PRP?
No.

As below, if we are going to enforce alignment/size of iovecs elsewhere, 
then I don't need to worry about virt boundary.

On a related topic, I am not sure about how - or if we even should -
enforce iovec PAGE-alignment or length; rather, the user could just be
advised that iovecs must be PAGE-aligned and min PAGE length to achieve
atomic_write_unit_max.
Anything that just advices the user an it not clear cut and results in
an error is data loss waiting to happen.  Even more so if it differs
from device to device.

ok, fine. I suppose that we better enforce it then.

Thanks,
John