On Fri, Oct 07, 2016 at 02:05:07PM -0400, Brian Foster wrote: > On Thu, Sep 29, 2016 at 08:10:39PM -0700, Darrick J. Wong wrote: > > Unshare all shared extents if the user calls fallocate with the new > > unshare mode flag set, so that we can guarantee that a subsequent > > write will not ENOSPC. > > > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > [hch: pass inode instead of file to xfs_reflink_dirty_range, > > use iomap infrastructure for copy up] > > Signed-off-by: Christoph Hellwig <hch@xxxxxx> > > --- > > fs/xfs/xfs_file.c | 10 ++ > > fs/xfs/xfs_reflink.c | 237 ++++++++++++++++++++++++++++++++++++++++++++++++++ > > fs/xfs/xfs_reflink.h | 2 > > 3 files changed, 247 insertions(+), 2 deletions(-) > > > > > ... > > diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c > > index 77ac810..065e836 100644 > > --- a/fs/xfs/xfs_reflink.c > > +++ b/fs/xfs/xfs_reflink.c > > @@ -1472,3 +1472,240 @@ xfs_reflink_remap_range( > > trace_xfs_reflink_remap_range_error(dest, error, _RET_IP_); > > return error; > > } > ... > > +/* Iterate the extents; if there are no reflinked blocks, clear the flag. */ > > +STATIC int > > +xfs_reflink_try_clear_inode_flag( > > + struct xfs_inode *ip, > > + xfs_off_t old_isize) > > +{ > > + struct xfs_mount *mp = ip->i_mount; > > + struct xfs_trans *tp; > > + xfs_fileoff_t fbno; > > + xfs_filblks_t end; > > + xfs_agnumber_t agno; > > + xfs_agblock_t agbno; > > + xfs_extlen_t aglen; > > + xfs_agblock_t rbno; > > + xfs_extlen_t rlen; > > + struct xfs_bmbt_irec map[2]; > > + int nmaps; > > + int error = 0; > > + > > + /* Start a rolling transaction to remove the mappings */ > > + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, 0, 0, 0, &tp); > > + if (error) > > + return error; > > + > > + xfs_ilock(ip, XFS_ILOCK_EXCL); > > + xfs_trans_ijoin(tp, ip, 0); > > + > > + if (old_isize != i_size_read(VFS_I(ip))) > > + goto cancel; > > + if (!(ip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK)) > > + goto cancel; > > + > > The code that has been merged is now different from this code :/, but > just a heads up that the code in the tree looks like it has another one > of those potentially blind transaction commit sequences between > xfs_reflink_try_clear_inode_flag() and xfs_reflink_clear_inode_flag(). _reflink_unshare jumps out if it's not a reflink inode before calling _reflink_try_clear_inode_flag -> _reflink_clear_inode_flag. We do not call _reflink_clear_inode_flag with a non-reflink inode. As for blindly committing a transaction with no dirty data, that's fine, _trans_commit checks for that case and simply frees everything attached to the transaction. > It doesn't appear to be a problem in how it is actually used in this > patch, but for reference, I think it's better practice for lower level > functions like xfs_reflink_clear_inode_flag() to assert that the flag is > set and make it the responsibility of the caller to check for it and do > the right thing. Just my .02 though. Ok, I'll add an assert. > > + fbno = 0; > > + end = XFS_B_TO_FSB(mp, old_isize); > > + while (end - fbno > 0) { > > + nmaps = 1; > > + /* > > + * Look for extents in the file. Skip holes, delalloc, or > > + * unwritten extents; they can't be reflinked. > > + */ > > + error = xfs_bmapi_read(ip, fbno, end - fbno, map, &nmaps, 0); > > + if (error) > > + goto cancel; > > + if (nmaps == 0) > > + break; > > + if (map[0].br_startblock == HOLESTARTBLOCK || > > + map[0].br_startblock == DELAYSTARTBLOCK || > > + ISUNWRITTEN(&map[0])) > > + goto next; > > + > > + map[1] = map[0]; > > + while (map[1].br_blockcount) { > > + agno = XFS_FSB_TO_AGNO(mp, map[1].br_startblock); > > + agbno = XFS_FSB_TO_AGBNO(mp, map[1].br_startblock); > > + aglen = map[1].br_blockcount; > > + > > + error = xfs_reflink_find_shared(mp, agno, agbno, aglen, > > + &rbno, &rlen, false); > > + if (error) > > + goto cancel; > > + /* Is there still a shared block here? */ > > + if (rlen > 0) { > > + error = 0; > > + goto cancel; > > + } > > + > > + map[1].br_blockcount -= aglen; > > + map[1].br_startoff += aglen; > > + map[1].br_startblock += aglen; > > This is basically doing: > > map[1] = map[0]; > while (map[1].br_blockcount) { > aglen = map[1].br_blockcount; > ... > map[1].br_blockcount -= aglen; > } > > So the loop appears to be completely superfluous. <nod> > > > + } > > + > > +next: > > + fbno = map[0].br_startoff + map[0].br_blockcount; > > + } > > + > > + /* > > + * We didn't find any shared blocks so turn off the reflink flag. > > + * First, get rid of any leftover CoW mappings. > > + */ > > + error = xfs_reflink_cancel_cow_blocks(ip, &tp, 0, NULLFILEOFF); > > + if (error) > > + goto cancel; > > + > > + /* Clear the inode flag. */ > > + trace_xfs_reflink_unset_inode_flag(ip); > > + ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK; > > + xfs_trans_ijoin(tp, ip, 0); > > + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); > > + > > + error = xfs_trans_commit(tp); > > + if (error) > > + goto out; > > + > > + xfs_iunlock(ip, XFS_ILOCK_EXCL); > > + return 0; > > +cancel: > > + xfs_trans_cancel(tp); > > +out: > > + xfs_iunlock(ip, XFS_ILOCK_EXCL); > > + return error; > > +} > > + > > +/* > > + * Pre-COW all shared blocks within a given byte range of a file and turn off > > + * the reflink flag if we unshare all of the file's blocks. > > + */ > > +int > > +xfs_reflink_unshare( > > + struct xfs_inode *ip, > > + xfs_off_t offset, > > + xfs_off_t len) > > +{ > > + struct xfs_mount *mp = ip->i_mount; > > + xfs_fileoff_t fbno; > > + xfs_filblks_t end; > > + xfs_off_t old_isize, isize; > > + int error; > > + > > + if (!xfs_is_reflink_inode(ip)) > > + return 0; > > + > > + trace_xfs_reflink_unshare(ip, offset, len); > > + > > + inode_dio_wait(VFS_I(ip)); > > + > > + /* Try to CoW the selected ranges */ > > + xfs_ilock(ip, XFS_ILOCK_EXCL); > > + fbno = XFS_B_TO_FSB(mp, offset); > > XFS_B_TO_FSBT() ? > > > + old_isize = isize = i_size_read(VFS_I(ip)); > > + end = XFS_B_TO_FSB(mp, offset + len); > > + error = xfs_reflink_dirty_extents(ip, fbno, end, isize); > > + if (error) > > + goto out_unlock; > > + xfs_iunlock(ip, XFS_ILOCK_EXCL); > > + > > + /* Wait for the IO to finish */ > > + error = filemap_write_and_wait(VFS_I(ip)->i_mapping); > > + if (error) > > + goto out; > > + > > + /* Turn off the reflink flag if we unshared the whole file */ > > + if (offset == 0 && len == isize) { > > Isn't this valid if len is larger than isize (similar check in > xfs_reflink_try_clear_inode_flag() might defeat this as well)? > > FWIW, this has a similar issue as the earlier truncate code in that we > might just unshare the shared regions and thus retain the flag. Yes, it is suboptimal for the flag to be set when there are no shared extents. I'm not sure when is a good opportunity to try to turn off the flag -- certainly we don't want to do that after every CoW operation. Doing it as part of a fallocate operation seems reasonable enough. Prior to the removal of the UNSHARE flag it would try to clear the flag any time the user asked for an unshare, but when I removed the flag from the interface I decided we should only do that if the user fallocated the entire file. Now that UNSHARE has been re-added to the interface, I'll just take out these weird checks. Note that the upcoming online repair patchset will try to unset the flag. --D > > Brian > > > + error = xfs_reflink_try_clear_inode_flag(ip, old_isize); > > + if (error) > > + goto out; > > + } > > + > > + return 0; > > + > > +out_unlock: > > + xfs_iunlock(ip, XFS_ILOCK_EXCL); > > +out: > > + trace_xfs_reflink_unshare_error(ip, error, _RET_IP_); > > + return error; > > +} > > diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h > > index df82b20..ad4fc61 100644 > > --- a/fs/xfs/xfs_reflink.h > > +++ b/fs/xfs/xfs_reflink.h > > @@ -48,5 +48,7 @@ extern int xfs_reflink_recover_cow(struct xfs_mount *mp); > > extern int xfs_reflink_remap_range(struct xfs_inode *src, xfs_off_t srcoff, > > struct xfs_inode *dest, xfs_off_t destoff, xfs_off_t len, > > unsigned int flags); > > +extern int xfs_reflink_unshare(struct xfs_inode *ip, xfs_off_t offset, > > + xfs_off_t len); > > > > #endif /* __XFS_REFLINK_H */ > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html