Re: [PATCH 10/21] block: Add fops atomic write support

Bart Van Assche <bvanassche@xxxxxxx> · Mon, 2 Oct 2023 12:12:49 -0700

On 10/2/23 03:10, John Garry wrote:
On 29/09/2023 18:51, Bart Van Assche wrote:
On 9/29/23 03:27, John Garry wrote:
 > +    if (pos % atomic_write_unit_min_bytes)
 > +        return false;

See later rules.

Is atomic_write_unit_min_bytes always equal to the logical block size?
If so, can the above test be left out?

 > +    if (iov_iter_count(iter) % atomic_write_unit_min_bytes)
 > +        return false;

For SCSI, there is an atomic write granularity, which dictates 
atomic_write_unit_min_bytes. So here we need to ensure that the length 
is a multiple of this value.

Are there any SCSI devices that we care about that report an ATOMIC 
TRANSFER LENGTH GRANULARITY that is larger than a single logical block?
I'm wondering whether we really have to support such devices.

 > +    if (!is_power_of_2(iov_iter_count(iter)))
 > +        return false;

This rule comes from FS block alignment and NVMe atomic boundary.

FSes (XFS) have discontiguous extents. We need to ensure that an atomic 
write does not cross discontiguous extents. To do this we ensure extent 
length and alignment and limit atomic_write_unit_max_bytes to that.

For NVMe, an atomic write boundary is a boundary in LBA space which an 
atomic write should not cross. We limit atomic_write_unit_max_bytes such 
that it is evenly divisible into this atomic write boundary.

To ensure that the write does not cross these alignment boundaries we 
say that it must be naturally aligned and a power-of-2 in length.

We may be able to relax this rule but I am not sure it buys us anything 
- typically we want to be writing a 64KB block aligned to 64KB, for 
example.

It seems to me that the requirement is_power_of_2(iov_iter_count(iter))
is necessary for some filesystems but not for all filesystems. 
Restrictions that are specific to a single filesystem (XFS) should not 
occur in code that is intended to be used by all filesystems 
(blkdev_atomic_write_valid()).

Thanks,

Bart.