On Thu, Sep 05, 2024 at 09:26:25AM +0530, Ritesh Harjani wrote: > Dave Chinner <david@xxxxxxxxxxxxx> writes: > > On Wed, Sep 04, 2024 at 11:44:29PM +0530, Ritesh Harjani wrote: > >> 3. It is the FORCEALIGN feature which _mandates_ both allocation > >> (by using extsize hint) and de-allocation to happen _only_ in > >> extsize chunks. > >> > >> i.e. forcealign mandates - > >> - the logical and physical start offset should be aligned as > >> per args->alignment > >> - extent length be aligned as per args->prod/mod. > >> If above two cannot be satisfied then return -ENOSPC. > > > > Yes. > > > >> > >> - Does the unmapping of extents also only happens in extsize > >> chunks (with forcealign)? > > > > Yes, via use of xfs_inode_alloc_unitsize() in the high level code > > aligning the fsbno ranges to be unmapped. > > > > Remember, force align requires both logical file offset and > > physical block number to be correctly aligned, > > This is where I would like to double confirm it again. Even the > extsize hint feature (w/o FORCEALIGN) will try to allocate aligned > physical start and logical start file offset and length right? No. > (Or does extsize hint only restricts alignment to logical start file > offset + length and not the physical start?) Neither. extsize hint by itself (i.e. existing behaviour) has no alignment effect at all. All it affects is -size- of the extent. i.e. once the extent start is chosen, extent size hints will trim the length of the extent to a multiple of the extent size hint. Alignment is not considered at all. > Also it looks like there is no difference with ATOMIC_WRITE AND > FORCEALIGN feature with XFS, correct? (except that ATOMIC_WRITE is > adding additional natural alignment restrictions on pos and len). Atomic write requires additional hardware support, and it restricts the valid sizes of extent size hints that can be set. Only atomic writes can be done on files marked as configured for atomic writes; force alignment can be done on any file... > So why maintain 2 separate on disk inode flags for FORCEALIGN AND > ATOMIC_WRITE? the atomic write flag indicates that a file has been set up correctly for atomic writes to be able to issues reliably. force alignment doesn't guarantee that - it's just a mechanism that tells the allocator to behave a specific way. > - Do you foresee FORCEALIGN to be also used at other places w/o > ATOMIC_WRITE where feature differentiation between the two on an > inode is required? The already exist. For example, reliably allocating huge page mappings on DAX filesystems requires 2MB forced alignment. > - Does the same reasoning will hold for XFS_SB_FEAT_RO_COMPAT_FORCEALIGN > & XFS_SB_FEAT_RO_COMPAT_ATOMICWRITES too? Same as above. > - But why ro_compact for ATOMICWRITES? There aren't any on disk metadata > changes within XFS filesystem to support atomic writes, right? Because if you downgrade the kernel to something that doesn't support atomic writes, then non-atomic sized/aligned data can be written to the file and/or torn writes can occur. Worse, extent size hints that don't match the underlying hardware support could be set up for inodes, and when the kernel is upgraded again then atomic writes will fail on inodes that have atomic write flags set on them.... > Is it something to just prevent users from destroying their own data > by not allowing a rw mount from an older kernel where users could do > unaligned writes to files marked for atomic writes? > Or is there any other reasoning to prevent XFS filesystem from becoming > inconsistent if an older kernel does a rw mount here. The older kernel does not know what the unknown inode flag means (i.e. atomic writes) and so, by definition, we cannot allow it to modify metadata or file data because it may not modify it in the correct way for that flag being set on the inode. Kernels that don't understand feature flags need to treat the filesystem as read-only, no matter how trivial the feature addition might seem. > > so unmap alignment > > has to be set up correctly at file offset level before we even know > > what extents underly the file range we need to unmap.... > > > >> If the start or end of the extent which needs unmapping is > >> unaligned then we convert that extent to unwritten and skip, > >> is it? (__xfs_bunmapi()) > > > > The high level code should be aligning the start and end of the > > file range to be removed via xfs_inode_alloc_unitsize(). Hence > > the low level __xfs_bunmapi() code shouldn't ever be encountering > > unaligned unmaps on force-aligned inodes. > > > > Yes, but isn't this code snippet trying to handle a case when it finds an > unaligned extent during unmap? It does exactly that. > And what we are essentially trying to > do here is leave the unwritten extent as is and if the found extent is > written then convert to unwritten and skip it (goto nodelete). This > means with forcealign if we encounter an unaligned extent then the file > will have this space reserved as is with extent marked unwritten. > > Is this understanding correct? Yes, except for the fact that this code is not triggered by force alignment. This code is preexisting realtime file functionality. It is used when the rtextent size is larger than a single filesytem block. In these configurations, we do allocation in rtextsize units, but we track written/unwritten extents in the BMBT on filesystem block granularity. Hence we can have unaligned written/unwritten extent boundaries, and if we aren't unmapping a whole rtextent then we simply mark the unused portion of it as unwritten in the BMBT to indicate it contains zeroes. IOWs, existing RT files have different IO alignment and written/unwritten extent conversion behaviour to the new forced alignment feature. The implementation code is shared in many places, but the higher level forced alignment functionality ensures there are never unaligned unwritten extents created or unaligned unmappings asked for. Hence this code does not trigger for the forced alignment cases. i.e. we have multiple seperate allocation alignment behaviours that share an implementation, but they do not all behave exactly the same way or provide the same alignment guarantees.... -Dave. -- Dave Chinner david@xxxxxxxxxxxxx