On 6/6/16 12:12 PM, Brian Foster wrote: > The filesystem quiesce sequence performs the operations necessary to > drain all background work, push pending transactions through the log > infrastructure and wait on I/O resulting from the final AIL push. We > have had reports of remount,ro hangs in xfs_log_quiesce() -> > xfs_wait_buftarg(), however, and some instrumentation code to detect > transaction commits at this point in the quiesce sequence has inculpated > the eofblocks background scanner as a cause. > > While higher level remount code generally prevents user modifications by > the time the filesystem has made it to xfs_log_quiesce(), the background > scanner may still be alive and can perform pending work at any time. If > this occurs between the xfs_log_force() and xfs_wait_buftarg() calls > within xfs_log_quiesce(), this can lead to an indefinite lockup in > xfs_wait_buftarg(). > > To prevent this problem, cancel the background eofblocks scan worker > during the remount read-only quiesce sequence. This suspends background > trimming when a filesystem is remounted read-only. This is only done in > the remount path because the freeze codepath has already locked out new > transactions by the time the filesystem attempts to quiesce (and thus > waiting on an active work item could deadlock). Kick the eofblocks > worker to pick up where it left off once an fs is remounted back to > read-write. > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > --- > > To confirm the problem, I have managed to manufacture the remount,ro > hang issue described above by hacking in some delays/coordination > between the quiesce and eofblocks background worker. > > Also, an alternative approach that I was considering is to run a > synchronous scan around the same place this patch cancels the background > scan. The idea is that the background scan would then have no work to do > on the subsequent iteration and then die off naturally (until > preallocation occurs once again). I suppose the downside is that a > remount might not necessarily be expected to muck with preallocation > state of affected files. This patch seemed more simple, but I'm open to > either. Thoughts? Your approach makes sense to me - "finishing" the scan prior to quiesce could take a relatively unknown amount of time as well, right? And I guess I don't see a real advantage to that; we don't wait for it on unmount (right?) Reviewed-by: Eric Sandeen <sandeen@xxxxxxxxxx> > Brian > > fs/xfs/xfs_icache.c | 2 +- > fs/xfs/xfs_icache.h | 1 + > fs/xfs/xfs_super.c | 8 ++++++++ > 3 files changed, 10 insertions(+), 1 deletion(-) > > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c > index 99ee6eee..fb39a66 100644 > --- a/fs/xfs/xfs_icache.c > +++ b/fs/xfs/xfs_icache.c > @@ -765,7 +765,7 @@ restart: > * Background scanning to trim post-EOF preallocated space. This is queued > * based on the 'speculative_prealloc_lifetime' tunable (5m by default). > */ > -STATIC void > +void > xfs_queue_eofblocks( > struct xfs_mount *mp) > { > diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h > index 62f1f91..05bac99 100644 > --- a/fs/xfs/xfs_icache.h > +++ b/fs/xfs/xfs_icache.h > @@ -68,6 +68,7 @@ void xfs_inode_clear_eofblocks_tag(struct xfs_inode *ip); > int xfs_icache_free_eofblocks(struct xfs_mount *, struct xfs_eofblocks *); > int xfs_inode_free_quota_eofblocks(struct xfs_inode *ip); > void xfs_eofblocks_worker(struct work_struct *); > +void xfs_queue_eofblocks(struct xfs_mount *); > > int xfs_inode_ag_iterator(struct xfs_mount *mp, > int (*execute)(struct xfs_inode *ip, int flags, void *args), > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > index 4700f09..7965371 100644 > --- a/fs/xfs/xfs_super.c > +++ b/fs/xfs/xfs_super.c > @@ -1294,6 +1294,7 @@ xfs_fs_remount( > */ > xfs_restore_resvblks(mp); > xfs_log_work_queue(mp); > + xfs_queue_eofblocks(mp); > } > > /* rw -> ro */ > @@ -1306,6 +1307,13 @@ xfs_fs_remount( > * return it to the same size. > */ > xfs_save_resvblks(mp); > + > + /* > + * Cancel background eofb scanning so it cannot race with the > + * final log force+buftarg wait and deadlock the remount. > + */ > + cancel_delayed_work_sync(&mp->m_eofblocks_work); > + > xfs_quiesce_attr(mp); > mp->m_flags |= XFS_MOUNT_RDONLY; > } > _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs