On Wed, Sep 21, 2011 at 09:42:37PM +1000, Dave Chinner wrote: > So, the log force not triggering in the AIL code looks to be the > problem. That, I simply cannot explain right now - it makes no sense > but that is what all the stats and trace events point to. I need to > do more investigation. Could it be that we have a huge amount of instances of xfs_ail_worker running at the same time? xfs_sync_wq is marked as WQ_CPU_INTENSIVE, so running/runnable workers are not counted towards the concurrency limit. From my look at the workqueue code this means we'll spawn new instances fairly quickly if the others are stuck. This means more and more of them hammering the pinned items, and we'll rarely reach the limit where we'd need to do a log force. What is also strange is that we allocate a xfs_ail_wq, but don't actually use it, although it would have the same idea. Stefan, can you try the following patch? This moves the ail work to it's explicit queue, and makes sure we never have the same work item (= same fs to be pushed) concurrently. Note that before Linux 3.1-rc you'll need to edit fs/xfs/xfs_super.c to be fs/xfs/linux-2.6/xfs_super.c in the patch manually. Index: linux-2.6/fs/xfs/xfs_super.c =================================================================== --- linux-2.6.orig/fs/xfs/xfs_super.c 2011-09-21 08:00:01.864768359 -0400 +++ linux-2.6/fs/xfs/xfs_super.c 2011-09-21 08:04:01.335266079 -0400 @@ -1654,7 +1654,7 @@ xfs_init_workqueues(void) if (!xfs_syncd_wq) goto out; - xfs_ail_wq = alloc_workqueue("xfsail", WQ_CPU_INTENSIVE, 8); + xfs_ail_wq = alloc_workqueue("xfsail", WQ_NON_REENTRANT, 8); if (!xfs_ail_wq) goto out_destroy_syncd; Index: linux-2.6/fs/xfs/xfs_trans_ail.c =================================================================== --- linux-2.6.orig/fs/xfs/xfs_trans_ail.c 2011-09-21 08:02:28.172765827 -0400 +++ linux-2.6/fs/xfs/xfs_trans_ail.c 2011-09-21 08:02:46.843266108 -0400 @@ -538,7 +538,7 @@ out_done: } /* There is more to do, requeue us. */ - queue_delayed_work(xfs_syncd_wq, &ailp->xa_work, + queue_delayed_work(xfs_ail_wq, &ailp->xa_work, msecs_to_jiffies(tout)); } @@ -575,7 +575,7 @@ xfs_ail_push( smp_wmb(); xfs_trans_ail_copy_lsn(ailp, &ailp->xa_target, &threshold_lsn); if (!test_and_set_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags)) - queue_delayed_work(xfs_syncd_wq, &ailp->xa_work, 0); + queue_delayed_work(xfs_ail_wq, &ailp->xa_work, 0); } /* _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs