On Thu, Jan 13, 2022 at 08:37:00AM -0500, Brian Foster wrote: > The xfs_inodegc_stop() helper performs a high level flush of pending > work on the percpu queues and then runs a cancel_work_sync() on each > of the percpu work tasks to ensure all work has completed before > returning. While cancel_work_sync() waits for wq tasks to complete, > it does not guarantee work tasks have started. This means that the > _stop() helper can queue and instantly cancel a wq task without > having completed the associated work. This can be observed by > tracepoint inspection of a simple "rm -f <file>; fsfreeze -f <mnt>" > test: > > xfs_destroy_inode: ... ino 0x83 ... > xfs_inode_set_need_inactive: ... ino 0x83 ... > xfs_inodegc_stop: ... > ... > xfs_inodegc_start: ... > xfs_inodegc_worker: ... > xfs_inode_inactivating: ... ino 0x83 ... > > The first few lines show that the inode is removed and need inactive > state set, but the inactivation work has not completed before the > inodegc mechanism stops. The inactivation doesn't actually occur > until the fs is unfrozen and the gc mechanism starts back up. Note > that this test requires fsfreeze to reproduce because xfs_freeze > indirectly invokes xfs_fs_statfs(), which calls xfs_inodegc_flush(). > > When this occurs, the workqueue try_to_grab_pending() logic first > tries to steal the pending bit, which does not succeed because the > bit has been set by queue_work_on(). Subsequently, it checks for > association of a pool workqueue from the work item under the pool > lock. This association is set at the point a work item is queued and > cleared when dequeued for processing. If the association exists, the > work item is removed from the queue and cancel_work_sync() returns > true. If the pwq association is cleared, the remove attempt assumes > the task is busy and retries (eventually returning false to the > caller after waiting for the work task to complete). > > To avoid this race, we can flush each work item explicitly before > cancel. However, since the _queue_all() already schedules each > underlying work item, the workqueue level helpers are sufficient to > achieve the same ordering effect. E.g., the inodegc enabled flag > prevents scheduling any further work in the _stop() case. Use the > drain_workqueue() helper in this particular case to make the intent > a bit more self explanatory. > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > --- > fs/xfs/xfs_icache.c | 22 ++++------------------ > 1 file changed, 4 insertions(+), 18 deletions(-) Looks good - flush/drain_workqueue() are much nicer was of doing this. Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> -- Dave Chinner david@xxxxxxxxxxxxx