On Mon, Jun 06, 2016 at 03:24:10PM -0500, Eric Sandeen wrote: > > > On 6/6/16 12:12 PM, Brian Foster wrote: > > The filesystem quiesce sequence performs the operations necessary to > > drain all background work, push pending transactions through the log > > infrastructure and wait on I/O resulting from the final AIL push. We > > have had reports of remount,ro hangs in xfs_log_quiesce() -> > > xfs_wait_buftarg(), however, and some instrumentation code to detect > > transaction commits at this point in the quiesce sequence has inculpated > > the eofblocks background scanner as a cause. > > > > While higher level remount code generally prevents user modifications by > > the time the filesystem has made it to xfs_log_quiesce(), the background > > scanner may still be alive and can perform pending work at any time. If > > this occurs between the xfs_log_force() and xfs_wait_buftarg() calls > > within xfs_log_quiesce(), this can lead to an indefinite lockup in > > xfs_wait_buftarg(). > > > > To prevent this problem, cancel the background eofblocks scan worker > > during the remount read-only quiesce sequence. This suspends background > > trimming when a filesystem is remounted read-only. This is only done in > > the remount path because the freeze codepath has already locked out new > > transactions by the time the filesystem attempts to quiesce (and thus > > waiting on an active work item could deadlock). Kick the eofblocks > > worker to pick up where it left off once an fs is remounted back to > > read-write. > > > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > > --- > > > > To confirm the problem, I have managed to manufacture the remount,ro > > hang issue described above by hacking in some delays/coordination > > between the quiesce and eofblocks background worker. > > > > Also, an alternative approach that I was considering is to run a > > synchronous scan around the same place this patch cancels the background > > scan. The idea is that the background scan would then have no work to do > > on the subsequent iteration and then die off naturally (until > > preallocation occurs once again). I suppose the downside is that a > > remount might not necessarily be expected to muck with preallocation > > state of affected files. This patch seemed more simple, but I'm open to > > either. Thoughts? > > Your approach makes sense to me - "finishing" the scan prior to quiesce > could take a relatively unknown amount of time as well, right? And I guess > I don't see a real advantage to that; we don't wait for it on unmount > (right?) > Yes, it could take an unknown amount of time. It would probably be similar to what we would do on an unmount. We don't explicitly wait for a scan on unmount (though we destroy the workqueue at some point), but I believe we have to reclaim all cached inodes, which then trims eofblocks for every inode where appropriate (via xfs_inactive()). > Reviewed-by: Eric Sandeen <sandeen@xxxxxxxxxx> > Thanks! Brian > > Brian > > > > fs/xfs/xfs_icache.c | 2 +- > > fs/xfs/xfs_icache.h | 1 + > > fs/xfs/xfs_super.c | 8 ++++++++ > > 3 files changed, 10 insertions(+), 1 deletion(-) > > > > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c > > index 99ee6eee..fb39a66 100644 > > --- a/fs/xfs/xfs_icache.c > > +++ b/fs/xfs/xfs_icache.c > > @@ -765,7 +765,7 @@ restart: > > * Background scanning to trim post-EOF preallocated space. This is queued > > * based on the 'speculative_prealloc_lifetime' tunable (5m by default). > > */ > > -STATIC void > > +void > > xfs_queue_eofblocks( > > struct xfs_mount *mp) > > { > > diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h > > index 62f1f91..05bac99 100644 > > --- a/fs/xfs/xfs_icache.h > > +++ b/fs/xfs/xfs_icache.h > > @@ -68,6 +68,7 @@ void xfs_inode_clear_eofblocks_tag(struct xfs_inode *ip); > > int xfs_icache_free_eofblocks(struct xfs_mount *, struct xfs_eofblocks *); > > int xfs_inode_free_quota_eofblocks(struct xfs_inode *ip); > > void xfs_eofblocks_worker(struct work_struct *); > > +void xfs_queue_eofblocks(struct xfs_mount *); > > > > int xfs_inode_ag_iterator(struct xfs_mount *mp, > > int (*execute)(struct xfs_inode *ip, int flags, void *args), > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > > index 4700f09..7965371 100644 > > --- a/fs/xfs/xfs_super.c > > +++ b/fs/xfs/xfs_super.c > > @@ -1294,6 +1294,7 @@ xfs_fs_remount( > > */ > > xfs_restore_resvblks(mp); > > xfs_log_work_queue(mp); > > + xfs_queue_eofblocks(mp); > > } > > > > /* rw -> ro */ > > @@ -1306,6 +1307,13 @@ xfs_fs_remount( > > * return it to the same size. > > */ > > xfs_save_resvblks(mp); > > + > > + /* > > + * Cancel background eofb scanning so it cannot race with the > > + * final log force+buftarg wait and deadlock the remount. > > + */ > > + cancel_delayed_work_sync(&mp->m_eofblocks_work); > > + > > xfs_quiesce_attr(mp); > > mp->m_flags |= XFS_MOUNT_RDONLY; > > } > > > > _______________________________________________ > xfs mailing list > xfs@xxxxxxxxxxx > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs