On Fri, Feb 23, 2018 at 11:08:19AM +1100, Dave Chinner wrote: > There's also a bug in the code where we take the ILOCK exclusive for > direct IO writes only to then have to demote it before calling > xfs_iomap_write_direct() if allocation is required. This was a > regression introduced with the iomap direct IO path rework... I actually have a fix for that and a few related bits pending in the O_ATOMIC tree, but that still has a few other items to fix.. The relevant commit is attached below for reference. > +xfs_ilock_for_iomap( > + struct xfs_inode *ip, > + unsigned flags, > + unsigned *lockmode) > { > + unsigned mode = XFS_ILOCK_SHARED; > + > + /* Modifications to reflink files require exclusive access */ > + if (xfs_is_reflink_inode(ip) && (flags & (IOMAP_WRITE | IOMAP_ZERO))) { > + /* > + * A reflinked inode will result in CoW alloc. > + * FIXME: It could still overwrite on unshared extents > + * and not need allocation in the direct IO path. > + */ > + if ((flags & IOMAP_NOWAIT) && (flags & IOMAP_DIRECT)) > + return -EAGAIN; The IOMAP_DIRECT check here doesn't really make sense - currently we will only get IOMAP_NOWAIT if IOMAP_DIRECT also is set, but if at some point that is not true there is no point in depending on IOMAP_DIRECT either. > + mode = XFS_ILOCK_EXCL; > + } > + > + /* Non-direct IO modifications require exclusive access */ > + if (!(flags & IOMAP_DIRECT) && (flags & IOMAP_WRITE)) > + mode = XFS_ILOCK_EXCL; There is no point in taking the lock exclusively in the main xfs_file_iomap_begin for !IOMAP_DIRECT at all. We only need it for the reflink case that is handled above, or if we call into xfs_file_iomap_begin_delay, which does its own locking. > + if (!xfs_ilock_nowait(ip, mode)) { > + if (flags & IOMAP_NOWAIT) > + return -EAGAIN; > + xfs_ilock(ip, mode); > + } This pattern has caused some nasty performance regressions, which is why we removed it again elsewhere. See the "xfs: fix AIM7 regression" commit (942491c9e6d631c012f3c4ea8e7777b0b02edeab). > + /* Non-modifying mapping requested, so we are done */ > + if (!(flags & (IOMAP_WRITE | IOMAP_ZERO))) > + goto iomap_found; This should probably separate from any locking changes. I thought about just splitting the pure read case from the direct / extsz allocate case but haven't looked into it yet. In fact the read only case could probably share code with xfs_xattr_iomap_begin. Note that we also have another bug in this area, in that we allocate blocks for holes or unwritten extents in reflink files, see the other attached patch.
>From 584c49543b2376cc46a6d4a640fd7123df987607 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig <hch@xxxxxx> Date: Mon, 12 Feb 2018 10:19:41 -0800 Subject: xfs: don't allocate COW blocks for zeroing holes or unwritten extents The iomap zeroing interface is smart enough to skip zeroing holes or unwritten extents. Don't subvert this logic for reflink files. Signed-off-by: Christoph Hellwig <hch@xxxxxx> --- fs/xfs/xfs_iomap.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 66e1edbfb2b2..4e771e0f1170 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -955,6 +955,13 @@ static inline bool imap_needs_alloc(struct inode *inode, (IS_DAX(inode) && imap->br_state == XFS_EXT_UNWRITTEN); } +static inline bool needs_cow_for_zeroing(struct xfs_bmbt_irec *imap, int nimaps) +{ + return nimaps && + imap->br_startblock != HOLESTARTBLOCK && + imap->br_state != XFS_EXT_UNWRITTEN; +} + static inline bool need_excl_ilock(struct xfs_inode *ip, unsigned flags) { /* @@ -1024,7 +1031,9 @@ xfs_file_iomap_begin( goto out_unlock; } - if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && xfs_is_reflink_inode(ip)) { + if (xfs_is_reflink_inode(ip) && + ((flags & IOMAP_WRITE) || + ((flags & IOMAP_ZERO) && needs_cow_for_zeroing(&imap, nimaps)))) { if (flags & IOMAP_DIRECT) { /* * A reflinked inode will result in CoW alloc. -- 2.14.2
>From 24220b8b4a18ff60e509ab640fbe2a48b6a4fa51 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig <hch@xxxxxx> Date: Mon, 12 Feb 2018 10:24:10 -0800 Subject: xfs: don't start out with the exclusive ilock for direct I/O There is no reason to take the ilock exclusively at the start of xfs_file_iomap_begin for direct I/O, given that it will be demoted just before calling xfs_iomap_write_direct anyway. Signed-off-by: Christoph Hellwig <hch@xxxxxx> --- fs/xfs/xfs_iomap.c | 19 +++++-------------- 1 file changed, 5 insertions(+), 14 deletions(-) diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 4e771e0f1170..bc9ff5d8efea 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -962,19 +962,6 @@ static inline bool needs_cow_for_zeroing(struct xfs_bmbt_irec *imap, int nimaps) imap->br_state != XFS_EXT_UNWRITTEN; } -static inline bool need_excl_ilock(struct xfs_inode *ip, unsigned flags) -{ - /* - * COW writes will allocate delalloc space, so we need to make sure - * to take the lock exclusively here. - */ - if (xfs_is_reflink_inode(ip) && (flags & (IOMAP_WRITE | IOMAP_ZERO))) - return true; - if ((flags & IOMAP_DIRECT) && (flags & IOMAP_WRITE)) - return true; - return false; -} - static int xfs_file_iomap_begin( struct inode *inode, @@ -1000,7 +987,11 @@ xfs_file_iomap_begin( return xfs_file_iomap_begin_delay(inode, offset, length, iomap); } - if (need_excl_ilock(ip, flags)) { + /* + * COW writes may allocate delalloc space or convert unwritten COW + * extents, so we need to make sure to take the lock exclusively here. + */ + if (xfs_is_reflink_inode(ip) && (flags & (IOMAP_WRITE | IOMAP_ZERO))) { lockmode = XFS_ILOCK_EXCL; xfs_ilock(ip, XFS_ILOCK_EXCL); } else { -- 2.14.2