On Thu, Jan 17, 2019 at 09:24:28AM -0800, Darrick J. Wong wrote: > On Fri, Jan 11, 2019 at 11:28:38AM -0500, Brian Foster wrote: > > On Mon, Dec 31, 2018 at 06:16:57PM -0800, Darrick J. Wong wrote: > > > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > > > > > When we're freezing the filesystem, free all the COW staging extents > > > before we shut the log down so that we can minimize the amount of > > > recovery work that will be necessary during the next mount. > > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > > --- > > > fs/xfs/xfs_icache.c | 17 ++++++++++++----- > > > fs/xfs/xfs_super.c | 18 ++++++++++++++++++ > > > 2 files changed, 30 insertions(+), 5 deletions(-) > > > > > > > > > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c > > > index 245483cc282b..36d986087abb 100644 > > > --- a/fs/xfs/xfs_icache.c > > > +++ b/fs/xfs/xfs_icache.c > > > @@ -1741,6 +1741,7 @@ xfs_inode_free_cowblocks( > > > void *args) > > > { > > > struct xfs_eofblocks *eofb = args; > > > + uint lock_mode = 0; > > > int match; > > > int ret = 0; > > > > > > @@ -1761,9 +1762,15 @@ xfs_inode_free_cowblocks( > > > return 0; > > > } > > > > > > - /* Free the CoW blocks */ > > > - xfs_ilock(ip, XFS_IOLOCK_EXCL); > > > - xfs_ilock(ip, XFS_MMAPLOCK_EXCL); > > > + /* > > > + * Free the CoW blocks. We don't need to lock the inode if we're in > > > + * the process of freezing the filesystem because we've already locked > > > + * out writes and page faults. > > > + */ > > > + if (ip->i_mount->m_super->s_writers.frozen == SB_UNFROZEN) > > > + lock_mode = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL; > > > > Ok, but is there a problem with actually locking in this context? If so, > > I'd expect the comment to say something like: "Free the COW blocks. > > Don't lock the inode if we're in the process of freezing the filesystem > > because <some bad thing happens>. This is safe because we've already > > locked out writes and page faults." > > Hmmm, it's now been so long since I wrote this patch I don't remember > why I did this. Wait something's coming back... > > > > + if (lock_mode) > > > + xfs_ilock(ip, lock_mode); > > > > > > /* > > > * Check again, nobody else should be able to dirty blocks or change > > > @@ -1772,8 +1779,8 @@ xfs_inode_free_cowblocks( > > > if (xfs_prep_free_cowblocks(ip)) > > > ret = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, false); > > > > > > - xfs_iunlock(ip, XFS_MMAPLOCK_EXCL); > > > - xfs_iunlock(ip, XFS_IOLOCK_EXCL); > > > + if (lock_mode) > > > + xfs_iunlock(ip, lock_mode); > > > > > > return ret; > > > } > > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > > > index 6903b36eac5d..bb4953cfd683 100644 > > > --- a/fs/xfs/xfs_super.c > > > +++ b/fs/xfs/xfs_super.c > > > @@ -1102,6 +1102,24 @@ xfs_fs_sync_fs( > > > if (!wait) > > > return 0; > > > > > > + /* > > > + * If we're in the middle of freezing the filesystem, this is our last > > > + * chance to run regular transactions. Clear all the COW staging > > > + * extents so that we can freeze the filesystem with as little recovery > > > + * work to do at the next mount as possible. It's safe to do this > > > + * without locking inodes because the freezer code flushed all the > > > + * dirty data from the page cache and locked out writes and page faults. > > > + */ > > > > Well, we have XFS_TRANS_NO_WRITECOUNT. It seems we could use an eofb > > freeze flag or some such to notify the scan that we're in a special > > freeze context and in turn use that to tweak the locking (instead of the > > SB_FREEZE_* checks) and avoid blocking on transaction allocation. Would > > that allow us to run this in xfs_fs_freeze() context? > > ...oh, right. Another thread could have taken the io/mmap locks (say > for fallocate) before the fs entered SB_FROZEN, and is now blocked > in xfs_trans_alloc waiting for the fs to unfreeze. I can't remember the > exact circumstances though. > Ok, so we'd basically deadlock on the freeze IIUC. That's exactly what we'd want to document in the comment. :P > > I'm also a little curious about the high-level tradeoff we're making. > > This patch means that a freeze, and thus something like an LVM snapshot, > > would clear/reset all COW blocks in the fs, right? If so, ISTM that > > could be annoying if the user had a bunch of reflinked vdisk images or > > something on the fs. > > Yeah, it's sort of painful to drop all those speculative preallocations. > > > Is the tradeoff just that if the user freezes and then doesn't unfreeze > > before the next mount that log recovery will have to deal with COW > > blocks, or are there considerations for a typical freeze/unfreeze cycle > > as well? Is this more expensive at recovery time than at freeze time? > > It's entirely to reduce the recovery work necessary after freezing the > fs, snapshotting the fs (via lvm or whatever) and then mounting the > snapshot separately. TBH this patch has languished for years now > because it's only necessary to optimize this one usage. > So the snapshot will run log recovery and end up freeing all of these extents "the hard way," right? > Maybe it's easier to drop this one, unless anyone feels particularly > passionate about clearing out cow staging extents before a freeze? > Not really. On principle, it seems more appropriate to me to put the work onto the snapshot rather than disrupt the primary fs. I could see doing something like this without much thought if the blocks were already freed or something and the change was thus invisible to a user, but ISTM that losing the preallocation can have a significant affect on the primary workload. Then again, this doesn't affect me much and I'm not terribly familiar with the use case or whether there have been complaints or anything. I don't feel strongly about it so long as the tradeoffs are documented somewhere so it's clear what needs fixing if this behavior became problematic down the road. Brian > --D > > > Brian > > > > > + if (sb->s_writers.frozen == SB_FREEZE_PAGEFAULT) { > > > + int error; > > > + > > > + error = xfs_icache_free_cowblocks(mp, NULL); > > > + if (error) { > > > + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); > > > + return error; > > > + } > > > + } > > > + > > > xfs_log_force(mp, XFS_LOG_SYNC); > > > if (laptop_mode) { > > > /* > > >