Re: [RFC 0/5] ext4: Implement support for extsize hints

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 13/09/2024 11:54, Ritesh Harjani (IBM) wrote:
John Garry <john.g.garry@xxxxxxxxxx> writes:

On 11/09/2024 10:01, Ojaswin Mujoo wrote:
This patchset implements extsize hint feature for ext4. Posting this RFC to get
some early review comments on the design and implementation bits. This feature
is similar to what we have in XFS too with some differences.

extsize on ext4 is a hint to mballoc (multi-block allocator) and extent
handling layer to do aligned allocations. We use allocation criteria 0
(CR_POWER2_ALIGNED) for doing aligned power-of-2 allocations. With extsize hint
we try to align the logical start (m_lblk) and length(m_len) of the allocation
to be extsize aligned. CR_POWER2_ALIGNED criteria in mballoc automatically make
sure that we get the aligned physical start (m_pblk) as well. So in this way
extsize can make sure that lblk, len and pblk all are aligned for the allocated
extent w.r.t extsize.

Note that extsize feature is just a hinting mechanism to ext4 multi-block
allocator. That means that if we are unable to get an aligned allocation for
some reason, than we drop this flag and continue with unaligned allocation to
serve the request. However when we will add atomic/untorn writes support, then
we will enforce the aligned allocation and can return -ENOSPC if aligned
allocation was not successful.

A few questions/confirmations:
- You have no intention of adding an equivalent of forcealign, right?

extsize is just a hinting mechanism that too only for __allocation__
path. But for atomic writes we do require some form of forcealign (like
how we have in XFS). So we could either call this directly as atomic
write feature or can may as well call this forcealign feature and make
atomic writes depend upon it, like how XFS is doing it.

I still haven't understood if there is/will be a user specifically for
forcealign other than atomic writes.
> > Since you asked, I am more curious to know if there is some more context
to your question?

As Darrick mentioned at the following, forcealign could be used for DAX:
https://lore.kernel.org/linux-xfs/170404855884.1770028.10371509002317647981.stgit@frogsfrogsfrogs/



- Would you also plan on using FS_IOC_FS(GET/SET)XATTR interface for
enabling atomic writes on a per-inode basis?

Yes, that interface should indeed be kept same for EXT4 too.


- Can extsize be set at mkfs time?

Good point. For now in this series, extsize can only be set using the
same ioctl on a per inode basis.

IIUC, XFS supports doing both right. We can do this on a per-inode basis
during ioctl or it also supports setting this during mkfs.xfs time.

Right

(maybe xfsprogs only allows setting this at mkfs time for rtvolumes for now)

extsize hint can already be set at mkfs time for both rtvol and !rtvol today.


So if this is set during mkfs.xfs time and then by default all inodes will
have this extsize attribute value set right?

Right

But there is still the option to set this later with xfs_io -c "extsize" per-inode.


BTW, this brings me to another question that I had asked here too [1].
1. For XFS, atomic writes can only be enabled with a fresh mkfs.xfs -d
atomic-writes=1 right?

Correct

Setting atomic-writes=1 enables the feature in the SB

2. For atomic writes to be enabled, we need all 3 features to be
enabled during mkfs.xfs time itself right?

Right, that is how it is currently done. But you could easily set extsize=4K at mkfs time so that not all inodes have a 16KB extsize, as in the example below. In this case, certain atomic write inodes could have their extsize increased to 16KB.

i.e.
"mkfs.xfs -i forcealign=1 -d extsize=16384 -d atomic-writes=1"

[1]: https://urldefense.com/v3/__https://lore.kernel.org/linux-xfs/20240817094800.776408-1-john.g.garry@xxxxxxxxxx/__;!!ACWV5N9M2RV99hQ!J0dwKULbs9neFPRiUN1VR63Ea-Qgjk77y6SFN4GPBN2zqIGP46CDH0vG6fpvEMDFCq-O05CMePOn70hy9FA3zlw$


- Is there any userspace support for this series available?

Make sense to maybe provide a userspace support link too.
For now, a quick hack would be to just allow setting extsize hint for
other fileystems as well in xfs_io.

diff --git a/io/open.c b/io/open.c
index 15850b55..6407b7e8 100644
--- a/io/open.c
+++ b/io/open.c
@@ -980,7 +980,7 @@ open_init(void)
         extsize_cmd.args = _("[-D | -R] [extsize]");
         extsize_cmd.argmin = 0;
         extsize_cmd.argmax = -1;
-       extsize_cmd.flags = CMD_NOMAP_OK;
+       extsize_cmd.flags = CMD_NOMAP_OK | CMD_FOREIGN_OK;
         extsize_cmd.oneline =
                 _("get/set preferred extent size (in bytes) for the open file");
         extsize_cmd.help = extsize_help;

<e.g>
/dev/loop6 on /mnt1/test type ext4 (rw,relatime)

root@qemu:~/xt/xfsprogs-dev# ./io/xfs_io -fc "extsize" /mnt1/test/f1
[0] /mnt1/test/f1
root@qemu:~/xt/xfsprogs-dev# ./io/xfs_io -c "extsize 16384" /mnt1/test/f1
root@qemu:~/xt/xfsprogs-dev# ./io/xfs_io -c "extsize" /mnt1/test/f1
[16384] /mnt1/test/f1

ok




- how would/could extsize interact with bigalloc?


As of now it is kept disabled with bigalloc.

+	if (sbi->s_cluster_ratio > 1) {
+		msg = "Can't use extsize hint with bigalloc";
+		err = -EINVAL;
+		goto error;
+	}



Comparison with XFS extsize feature -
=====================================
1. extsize in XFS is a hint for aligning only the logical start and the lengh
     of the allocation v/s extsize on ext4 make sure the physical start of the
     extent gets aligned as well.

note that forcealign with extsize aligns AG block also

Can you expand that on a bit. You mean during mkfs.xfs time we ensure
agblock boundaries are extsize aligned?

Yes, see align_ag_geometry() at https://github.com/johnpgarry/xfsprogs-dev/commits/atomic-writes/



only for atomic writes do we enforce the AG block is aligned to physical
block


If you could expand that a bit please? You meant during mkfs.xfs
time for atomic writes we ensure ag block start bounaries are extsize aligned?

We do this for forcealign with the extsize value supplied at mkfs time.

There are 2x things to consider about this:
- mkfs-specified extsize need not necessarily be a power-of-2
- even if this mkfs-specified extsize is a power-of-2, attempting to increase extsize for an inode enabled for atomic writes may be restricted, as the new extsize may not align with the AG count.

For example, extsize was 64KB and AG count = 16400 FSB (1025 * 64KB), then we cannot enable an inode for atomic writes with extsize = 128KB, as the disk block would not be aligned with the AG block.




2. eof allocation on XFS trims the blocks allocated beyond eof with extsize
     hint. That means on XFS for eof allocations (with extsize hint) only logical
     start gets aligned. However extsize hint in ext4 for eof allocation is not
     supported in this version of the series.

3. XFS allows extsize to be set on file with no extents but delayed data.
     However, ext4 don't allow that for simplicity. The user is expected to set
     it on a file before changing it's i_size.

4. XFS allows non-power-of-2 values for extsize but ext4 does not, since we
     primarily would like to support atomic writes with extsize.

5. In ext4 we chose to store the extsize value in SYSTEM_XATTR rather than an
     inode field as it was simple and most flexible, since there might be more
     features like atomic/untorn writes coming in future.

6. In buffered-io path XFS switches to non-delalloc allocations for extsize hint.
     The same has been kept for EXT4 as well.

Some TODOs:
===========
1. EOF allocations support can be added and can be kept similar to XFS

Note that EOF alignment for forcealign may change - it needs to be
discussed further.

Sure, thanks for pointing that out.
I guess you are referring to mainly the truncate related EOF alignment change
required with forcealign for XFS.


Thanks,
John




[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux