Re: [PATCH v4 00/14] forcealign for xfs

John Garry <john.g.garry@xxxxxxxxxx> · Thu, 14 Nov 2024 16:22:01 +0000

On 14/11/2024 12:48, Long Li wrote:
On Wed, Sep 18, 2024 at 11:12:47AM +0100, John Garry wrote:
On 17/09/2024 23:27, Dave Chinner wrote:
# xfs_bmap -vvp  mnt/file
mnt/file:
EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
    0: [0..15]:         384..399          0 (384..399)          16 010000
    1: [16..31]:        400..415          0 (400..415)          16 000000
    2: [32..127]:       416..511          0 (416..511)          96 010000
    3: [128..255]:      256..383          0 (256..383)         128 000000
FLAG Values:
     0010000 Unwritten preallocated extent

Here we have unaligned extents wrt extsize.

The sub-alloc unit zeroing would solve that - is that what you would still
advocate (to solve that issue)?
Yes, I thought that was already implemented for force-align with the
DIO code via the extsize zero-around changes in the iomap code. Why
isn't that zero-around code ensuring the correct extent layout here?
I just have not included the extsize zero-around changes here. They were
just grouped with the atomic writes support, as they were added specifically
for the atomic writes support. Indeed - to me at least - it is strange that
the DIO code changes are required for XFS forcealign implementation. And,
even if we use extsize zero-around changes for DIO path, what about buffered
IO?

I've been reviewing and testing the XFS atomic write patch series. Since
there haven't been any new responses to the previous discussions on this
issue, I'd like to inquire about the buffered IO problem with force-aligned
files, which is a scenario we might encounter.

Consider a case where the file supports force-alignment with a 64K extent size,
and the system page size is 4K. Take the following commands as an example:

xfs_io  -c "pwrite 64k 64k" mnt/file
xfs_io  -c "pwrite 8k 8k" mnt/file

If unaligned unwritten extents are not permitted, we need to zero out the
sub-allocation units for ranges [0, 8K] and [16K, 64K] to prevent stale
data. 

How does this prevent stale data? Just zeroing will ensure aligned 
extents. Unless iomap is provided a mapping for the fully aligned extent.

While this can be handled relatively easily in direct I/O scenarios,
it presents significant challenges in buffered I/O operations. The main
difficulty arises because the extent size (64K) is larger than the page
size (4K), and our current code base has substantial limitations in handling
such cases.

What is the limitation exactly?

Any thoughts on this?

TBH, the buffered IO case has not been considered too much.

The sub-extent zeroing was intended for atomic writes > 1x FSB and we 
only care about DIO there.

Thanks,
John