Re: [PATCH v4 00/14] forcealign for xfs

John Garry <john.g.garry@xxxxxxxxxx> · Mon, 16 Sep 2024 10:44:38 +0100

* I guess that you had not been following the recent discussion on this
topic in the latest xfs atomic writes series @ https://urldefense.com/v3/__https://lore.kernel.org/linux-xfs/20240817094800.776408-1-john.g.garry@xxxxxxxxxx/__;!!ACWV5N9M2RV99hQ!JIzCbkyp3JuPyzBx1n80WAdog5rLxMRB65FYrs1sFf3ei-wOdqrU_DZBE5zwrJXhrj949HSE0TwOEV0ciu8$
and also mentioned earlier in
https://urldefense.com/v3/__https://lore.kernel.org/linux-xfs/20240726171358.GA27612@xxxxxx/__;!!ACWV5N9M2RV99hQ!JIzCbkyp3JuPyzBx1n80WAdog5rLxMRB65FYrs1sFf3ei-wOdqrU_DZBE5zwrJXhrj949HSE0TwOiiEnYSk$

There I dropped the sub-alloc unit zeroing. The concept to iter for a single
bio seems sane, but as Darrick mentioned, we have issue of non-atomically
committing all the extent conversions.

Yes, I understand these problems exist.  My entire point is that the
forced alignment implemention should never allow such unaligned
extent patterns to be created in the first place. If we avoid
creating such situations in the first place, then we never have to
care about about unaligned unwritten extent conversion breaking
atomic IO.

OK, but what about this situation with non-EOF unaligned extents:

# xfs_io -c "lsattr -v" mnt/file
[extsize, has-xattr, force-align] mnt/file
# xfs_io -c "extsize" mnt/file
[65536] mnt/file
#
# xfs_io  -d -c "pwrite 64k 64k" mnt/file
# xfs_io  -d -c "pwrite 8k 8k" mnt/file
# xfs_bmap -vvp  mnt/file
mnt/file:
EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
  0: [0..15]:         384..399          0 (384..399)          16 010000
  1: [16..31]:        400..415          0 (400..415)          16 000000
  2: [32..127]:       416..511          0 (416..511)          96 010000
  3: [128..255]:      256..383          0 (256..383)         128 000000
FLAG Values:
   0010000 Unwritten preallocated extent

Here we have unaligned extents wrt extsize.

The sub-alloc unit zeroing would solve that - is that what you would 
still advocate (to solve that issue)?


FWIW, I also understand things are different if we are doing 128kB
atomic writes on 16kB force aligned files. However, in this
situation we are treating the 128kB atomic IO as eight individual
16kB atomic IOs that are physically contiguous.

Yes, if 16kB force aligned, userspace can only issue 16KB atomic writes.

Hence in this
situation it doesn't matter if we have a mix of 16kB aligned
written/unwritten/hole extents as each 16kB chunks is independent of
the others.

Sure


What matters is that each indivudal 16kB chunk shows either the old
data or the new data - we are not guaranteeing that the entire 128kB
write is atomic. Hence in this situation we can both submit and
process each 16kB shunk as independent IOs with independent IO
compeltion transactions. All that matters is that we don't signal
completion to userspace until all the IO is complete, and we already
do that for fragmented DIO writes...

Again, this is different to the traditional RT file behaviour - it
can use unwritten extents for sub-alloc-unit alignment unmaps
because the RT device can align file offset to any physical offset,
and issue unaligned sector sized IO without any restrictions. Forced
alignment does not have this freedom, and when we extend forced
alignment to RT files, it will not have the freedom to use
unwritten extents for sub-alloc-unit unmapping, either.

So how do you think that we should actually implement
xfs_itruncate_extents_flags() properly for forcealign? Would it simply be
like:

--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1050,7 +1050,7 @@ xfs_itruncate_extents_flags(
                 WARN_ON_ONCE(first_unmap_block > XFS_MAX_FILEOFF);
                 return 0;
         }
+	if (xfs_inode_has_forcealign(ip))
+	       first_unmap_block = xfs_inode_roundup_alloc_unit(ip,
first_unmap_block);
         error = xfs_bunmapi_range(&tp, ip, flags, first_unmap_block,

Yes, it would be something like that, except it would have to be
done before first_unmap_block is verified.


ok, and are you still of the opinion that this does not apply to rtvol?

Thanks,
John