Re: [PATCH v3 08/21] xfs: Introduce FORCEALIGN inode flag

John Garry <john.g.garry@xxxxxxxxxx> · Thu, 2 May 2024 08:56:49 +0100

On 02/05/2024 01:50, Dave Chinner wrote:
For example, consider xfs_file_dio_write(), where we check for an unaligned
write based on forcealign extent mask. It's much simpler to rely on a
power-of-2 size. And same for iomap extent zeroing.
But it's not more complex - we already do this non-power-of-2
alignment stuff for all the realtime code, so it's just a matter
of not blindly using bit masking in alignment checks.

So then it can be asked, for what reason do we want to support unorthodox,
non-power-of-2 sizes? Who would want this?
I'm constantly surprised by the way people use stuff like this
filesystem and storage alignment constraints are not arbitrarily
limited to power-of-2 sizes.

For example, code implementation is simple in RAID setups when you
use power-of-2 chunk sizes and stripe widths. But not all storage
hardware fits power-of-2 configs like 4+1, 4+2, 8+1, 8+2, etc. THis
is pretty common - 2.5" 2U drive trays have 24 drive bays. If you
want to give up 33% of the storage capacity just to use power-of-2
stripe widths then you would use 4x4+2 RAID6 luns. However, most
people don't want to waste that much money on redundancy. They are
much more likely to use 2x10+2 RAID6 luns or 1x21+2 with a hot spare
to maximise the data storage capacity.

Thanks for sharing this info

If someone wants to force-align allocation to stripe widths on such
a RAID array config rather than trying to rely on the best effort
swalloc mount option, then they need non-power-of-2
alignments to be supported.

It's pretty much a no-brainer - the alignment code already handles
non-power-of-2 alignments, and it's not very much additional code to
ensure we can handle any alignment the user specified.

ok, fine

As for AG size, again I think that it is required to be aligned to the
forcealign extsize. As I remember, when converting from an FSB to a DB, if
the AG itself is not aligned to the forcealign extsize, then the DB will not
be aligned to the forcealign extsize. More below...

+	/* Requires agsize be a multiple of extsize */
+	if (mp->m_sb.sb_agblocks % extsize)
+		return __this_address;
+
+	/* Requires stripe unit+width (if set) be a multiple of extsize */
+	if ((mp->m_dalign && (mp->m_dalign % extsize)) ||
+	    (mp->m_swidth && (mp->m_swidth % extsize)))
+		return __this_address;
Again, this is an atomic write constraint, isn't it?
So why do we want forcealign? It is to only align extent FSBs?
Yes. forced alignment is essentially just extent size guarantees.

This is part of what is needed for atomic writes, but atomic writes
also require specific physical storage alignment between the
filesystem and the device. The filesystem setup has to correctly
align AGs to the physical storage, and stuff like RAID
configurations need to be specifically compatible with the atomic
write capabilities of the underlying hardware.

None of these hardware iand storage stack alignment constraints have
any relevance to the filesystem forced alignment functionality. They
are completely indepedent. All the forced alignment does is
guarantees that allocation is aligned according the extent size hint
on the inode or it fails with ENOSPC.

Fine, so only for atomic writes we just need to ensure FSBs are aligned 
to DBs.

And so it is the responsibility of mkfs to ensure AG size aligns to any 
forcealign extsize specified and also disk atomic write geometry.

For atomic write only, it is the responsibility of the kernel to check 
the forcealign extsize is compatible with any stripe alignment and AG size.

Can you please separate these and put all the force align user API
validation checks in the one function?

ok, fine. But it would be good to have clarification on function of
forcealign, above, i.e. does it always align extents to disk blocks?
No, it doesn't. XFS has never done this - physical extent alignment
is always done relative to the start of the AG, not the underlying
disk geometry.

IOWs, forced alignement is not aligning to disk blocks at all - it
is aligning extents logically to file offset and physically to the
offset from the start of the allocation group.  Hence there are no
real constraints on forced alignment - we can do any sort of
alignment as long it is smaller than half the max size of a physical
extent.

For allocation to then be aligned to physical storage, we need mkfs
to physically align the start of each AG to the geometry of the
underlying storage. We already do this for filesystems with a stripe
unit defined, hence stripe aligned allocation is physically aligned
to the underlying storage.

Sure

However, if mkfs doesn't get the physical layout of AGs right, there
is nothing the mounted filesystem can do to guarantee extent
allocation is aligned to physical disk blocks regardless of whether
forced alignment is enabled or not...

ok, understood.

Thanks,
John