On 05/24/2012 12:06 PM, Brian Foster wrote: > xfsaild idle mode logic currently leads to a couple hangs: > > 1.) If xfsaild is rescheduled in during an incremental scan > (i.e., tout != 0) and the target has been updated since > the previous run, we can hit the new target and go into > idle mode with a still populated ail. > 2.) A wake up is only issued when the target is pushed forward. > The wake up can race with xfsaild if it is currently in the > process of entering idle mode, causing future wake up > events to be lost. > > Both hangs are reproducible by running xfstests 273 in a loop. > Modify xfsaild to enter idle mode only when the ail is empty > and the push target has not been moved forward since the last > push. > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > --- > > This is a lightly tested version against the xfs tree. I'll be more > heavily testing a version based on my upstream reproducer tree over the > next few days followed by similar testing on this patch if all goes > well. Sending in advance for review. > FYI, I've done a decent amount of testing at this point and can no longer reproduce the original lockups due to idle mode. I can still reproduce the sync worker lockup in the xfs tree, but I verify that xfsaild is actually still running in that case (thus it's a different issue). Unless there are other objections, this patch seems good to me. Thanks. Brian > fs/xfs/xfs_trans_ail.c | 29 ++++++++++++++++++++++++++--- > fs/xfs/xfs_trans_priv.h | 1 + > 2 files changed, 27 insertions(+), 3 deletions(-) > > diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c > index 9c51448..0819cd3 100644 > --- a/fs/xfs/xfs_trans_ail.c > +++ b/fs/xfs/xfs_trans_ail.c > @@ -383,6 +383,12 @@ xfsaild_push( > } > > spin_lock(&ailp->xa_lock); > + > + /* barrier matches the xa_target update in xfs_ail_push() */ > + smp_rmb(); > + target = ailp->xa_target; > + ailp->xa_target_prev = target; > + > lip = xfs_trans_ail_cursor_first(ailp, &cur, ailp->xa_last_pushed_lsn); > if (!lip) { > /* > @@ -397,7 +403,6 @@ xfsaild_push( > XFS_STATS_INC(xs_push_ail); > > lsn = lip->li_lsn; > - target = ailp->xa_target; > while ((XFS_LSN_CMP(lip->li_lsn, target) <= 0)) { > int lock_result; > > @@ -527,8 +532,26 @@ xfsaild( > __set_current_state(TASK_KILLABLE); > else > __set_current_state(TASK_INTERRUPTIBLE); > - schedule_timeout(tout ? > - msecs_to_jiffies(tout) : MAX_SCHEDULE_TIMEOUT); > + > + spin_lock(&ailp->xa_lock); > + > + /* barrier matches the xa_target update in xfs_ail_push() */ > + smp_rmb(); > + if (!xfs_ail_min(ailp) && (ailp->xa_target == ailp->xa_target_prev)) { > + /* the ail is empty and no change to the push target - idle */ > + spin_unlock(&ailp->xa_lock); > + schedule(); > + tout = 0; > + continue; > + } > + spin_unlock(&ailp->xa_lock); > + > + if (tout) { > + /* more work to do soon */ > + schedule_timeout(msecs_to_jiffies(tout)); > + } > + > + __set_current_state(TASK_RUNNING); > > try_to_freeze(); > > diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h > index fb62377..53b7c9b 100644 > --- a/fs/xfs/xfs_trans_priv.h > +++ b/fs/xfs/xfs_trans_priv.h > @@ -67,6 +67,7 @@ struct xfs_ail { > struct task_struct *xa_task; > struct list_head xa_ail; > xfs_lsn_t xa_target; > + xfs_lsn_t xa_target_prev; > struct list_head xa_cursors; > spinlock_t xa_lock; > xfs_lsn_t xa_last_pushed_lsn; _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs