On Mon, Mar 19, 2018 at 10:45 AM, Darrick J. Wong <darrick.wong@xxxxxxxxxx> wrote: > On Thu, Mar 15, 2018 at 08:52:45AM -0700, Dan Williams wrote: >> When xfs is operating as the back-end of a pNFS block server, it prevents >> collisions between local and remote operations by requiring a lease to >> be held for remotely accessed blocks. Local filesystem operations break >> those leases before writing or mutating the extent map of the file. >> >> A similar mechanism is needed to prevent operations on pinned dax >> mappings, like device-DMA, from colliding with extent unmap operations. >> >> BREAK_WRITE and BREAK_TRUNCATE are introduced as two distinct levels of >> layout breaking. >> >> Layouts are broken in the BREAK_WRITE case to ensure that layout-holders >> do not collide with local writes. Additionally, layouts are broken in >> the BREAK_TRUNCATE case to make sure the layout-holder has a consistent >> view of the file's extent map. While BREAK_WRITE breaks can be satisfied >> be recalling FL_LAYOUT leases, BREAK_TRUNCATE breaks additionally >> require waiting for busy dax-pages to go idle. >> >> Cc: "Darrick J. Wong" <darrick.wong@xxxxxxxxxx> >> Cc: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> >> Reported-by: Dave Chinner <david@xxxxxxxxxxxxx> >> Reported-by: Christoph Hellwig <hch@xxxxxx> >> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> >> --- >> fs/xfs/xfs_file.c | 23 +++++++++++++++++------ >> fs/xfs/xfs_inode.h | 18 ++++++++++++++++-- >> fs/xfs/xfs_ioctl.c | 2 +- >> fs/xfs/xfs_iops.c | 5 +++-- >> 4 files changed, 37 insertions(+), 11 deletions(-) >> >> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c >> index 5742d395a4e4..399c5221f101 100644 >> --- a/fs/xfs/xfs_file.c >> +++ b/fs/xfs/xfs_file.c >> @@ -352,7 +352,7 @@ xfs_file_aio_write_checks( >> >> xfs_ilock(ip, XFS_MMAPLOCK_EXCL); >> *iolock |= XFS_MMAPLOCK_EXCL; >> - error = xfs_break_layouts(inode, iolock); >> + error = xfs_break_layouts(inode, iolock, BREAK_WRITE); >> if (error) { >> *iolock &= ~XFS_MMAPLOCK_EXCL; >> xfs_iunlock(ip, XFS_MMAPLOCK_EXCL); >> @@ -762,7 +762,8 @@ xfs_file_write_iter( >> int >> xfs_break_layouts( >> struct inode *inode, >> - uint *iolock) >> + uint *iolock, >> + enum layout_break_reason reason) >> { >> struct xfs_inode *ip = XFS_I(inode); >> int ret; >> @@ -770,9 +771,19 @@ xfs_break_layouts( >> ASSERT(xfs_isilocked(ip, XFS_IOLOCK_SHARED | XFS_IOLOCK_EXCL >> | XFS_MMAPLOCK_EXCL)); >> >> - ret = xfs_break_leased_layouts(inode, iolock); >> - if (ret > 0) >> - ret = 0; >> + switch (reason) { >> + case BREAK_TRUNCATE: >> + /* fall through */ >> + case BREAK_WRITE: >> + ret = xfs_break_leased_layouts(inode, iolock); >> + if (ret > 0) >> + ret = 0; >> + break; >> + default: >> + ret = -EINVAL; >> + break; >> + } >> + >> return ret; >> } >> >> @@ -802,7 +813,7 @@ xfs_file_fallocate( >> return -EOPNOTSUPP; >> >> xfs_ilock(ip, iolock); >> - error = xfs_break_layouts(inode, &iolock); >> + error = xfs_break_layouts(inode, &iolock, BREAK_TRUNCATE); >> if (error) >> goto out_unlock; >> >> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h >> index 74c63f3a720f..1a66c7afcf45 100644 >> --- a/fs/xfs/xfs_inode.h >> +++ b/fs/xfs/xfs_inode.h >> @@ -379,6 +379,20 @@ static inline void xfs_ifunlock(struct xfs_inode *ip) >> >> XFS_ILOCK_SHIFT) >> >> /* >> + * Layouts are broken in the BREAK_WRITE case to ensure that >> + * layout-holders do not collide with local writes. Additionally, >> + * layouts are broken in the BREAK_TRUNCATE case to make sure the >> + * layout-holder has a consistent view of the file's extent map. While >> + * BREAK_WRITE breaks can be satisfied be recalling FL_LAYOUT leases, >> + * BREAK_TRUNCATE breaks additionally require waiting for busy dax-pages >> + * to go idle. >> + */ >> +enum layout_break_reason { >> + BREAK_WRITE, > > I wonder if this belongs in a system header file since the same general > principle is going to apply to ext4 and others? If this really is a > special xfs thing, then please use the "XFS_" prefix. > >> + BREAK_TRUNCATE, > > The "truncate" part of the name misled me for a bit. That operation is > really about "make sure there are no busy dax pages because we're about > to change some file pmem^Wblock mappings", right? Truncation, hole > punching, reflinking, and zero_range (because it punches the range and > reallocates unwritten extents) all have to wait for busy dax pages to > become free before they can begin unmapping blocks. So this isn't just > about truncation specifically; can I suggest "BREAK_UNMAPI" ? I like that name better. It isn't clear that this needs to go to a system header because I expect ext4 will only support BREAK_UNMAPI for dax and not care about BREAK_WRITE since that is an XFS-only / pNFS-only distinction in current code. However, I'll let Jan chime in if this guess is incorrect and ext4 plans to add pNFS block-server support. > (I also haven't figured out whether the same requirements apply to > things that *add* extent maps to the file, but I have to run to a > meeting in a few so wanted to get this email sent.) Add should not be a problem because DMA only ever starts after extents are added, i.e. there's no risk of DMA starting to invalid address because fs/dax.c arranges for extents to be allocated at fault time. -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html