On Mon, Dec 18, 2017 at 08:53:01PM -0800, Darrick J. Wong wrote: > On Tue, Dec 19, 2017 at 03:37:02PM +1100, Dave Chinner wrote: > > On Mon, Dec 18, 2017 at 07:49:11PM -0800, Darrick J. Wong wrote: > > > On Tue, Dec 19, 2017 at 11:17:55AM +1100, Dave Chinner wrote: > > > > On Fri, Dec 15, 2017 at 09:11:31AM -0800, Darrick J. Wong wrote: > > > > > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > > > > > > > > > When we're remounting the filesystem readonly, remove all CoW > > > > > preallocations prior to going ro. If the fs goes down after the ro > > > > > remount, we never clean up the staging extents, which means xfs_check > > > > > will trip over them on a subsequent run. Practically speaking, the > > > > > next mount will clean them up too, so this is unlikely to be seen. > > > > > > > > > > Found by adding clonerange to fsstress and running xfs/017. > > > > > > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > > > > --- > > > > > fs/xfs/xfs_super.c | 8 ++++++++ > > > > > 1 file changed, 8 insertions(+) > > > > > > > > > > > > > > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > > > > > index f663022..7b6d150 100644 > > > > > --- a/fs/xfs/xfs_super.c > > > > > +++ b/fs/xfs/xfs_super.c > > > > > @@ -1369,6 +1369,14 @@ xfs_fs_remount( > > > > > > > > > > /* rw -> ro */ > > > > > if (!(mp->m_flags & XFS_MOUNT_RDONLY) && (*flags & MS_RDONLY)) { > > > > > + /* Get rid of any leftover CoW reservations... */ > > > > > + cancel_delayed_work_sync(&mp->m_cowblocks_work); > > > > > + error = xfs_icache_free_cowblocks(mp, NULL); > > > > > + if (error) { > > > > > + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); > > > > > + return error; > > > > > + } > > > > > > > > On rw->ro do we start the m_cowblocks_work back up? > > > > > > Assuming you meant to ask about ro->rw, then yes it should get started > > > back up the next time something sets the cowblocks tag. I'm not opposed > > > to starting it back up directly from the ro->rw handler. > > > > > > > What about when we freeze the filesystem - shouldn't we clean > > > > up the cow blocks there, too? We've tried hard in the past to make > > > > freeze and rw->ro exactly the same so that if the system is powered > > > > down while frozen it comes up almost entirely clean just like a > > > > ro-remount in shutdown.... > > > > > > I don't see a hard requirement to clean them up at freeze time, though > > > we certainly can do it for consistency's sake. > > > > can't the background worker come around and attempt to do cleanup > > while the fs is frozen? We've had vectors like that in the past that > > have written to frozen filesystems (e.g. inode reclaim writing > > inodes, memory reclaim shrinkers triggering AIL pushes) so leaving > > potentially dirty objects in memory when the filesystem is frozen > > is kinda dangerous. That's the reason behind trying to make > > freeze/ro states identical - it makes sure we don't accidentally > > leave writable objects in memory when frozen... > > Hmmm, so /me tried making fsfreeze clear out the cow reservations, but > doing so requires allocating a transaction, which blows the assert in > sb_start_write because the fs is already frozen... Ah, didn't we solve that problem years ago? Ah, yeah, XFS_TRANS_NO_WRITECOUNT. That'd be a bit of a hack, but the problem here is we need to run this between freezing data writes and freezing transactions and we have no hook in the generic freeze code to do that... > I could just kill > the thread without cleaning out the cow reservations and let the > post-crash mount clean things up, since we already have the > infrastructure to do that anyway? Well, we do leave the log dirty on freeze so that we cleanup unlinked inodes if we crash while frozen, so there is precedence there. However, we need to balance that with the fairly common problem of having to run recovery on read-only snapshots on the first mount because a freeze leaves the log dirty. I don't think we want to make that problem worse so I'd like to avoid this solution if at all possible. > (Or create a ->freeze_super and do it there...) A ->freeze_data callout from the generic freezing code would be more appropriate than completely reimplementing our own freeze code. Right now the generic code just calls sync_filesystem(sb) to do this before setting SB_FREEZE_FS - we need to do more than just sync data if we are going to remove cow mappings on freeze.... -Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html