Hi, We've hit kernel hang related to XFS reclaim under heavy I/O load on a couple of storage servers using XFS over flashcache over a 3.13.y kernel. On the crash dumps, kthreadd is blocked, waiting for XFS to reclaim some memory but the related reclaim job is queued on a worker_pool stuck waiting for some I/O, itself depending on other jobs on other queues which would require additional threads to go forward. Unfortunately kthreadd is blocked. The host has plenty of memory (~128GB), about 80% of which being used for the page cache. It looks like this is fixed by commit 7a29ac474a47eb8cf212b45917683ae89d6fa13b. We manually applied a fix to our internal branch but I could not find a similar commit on the longterm branches. Maybe it could be a good candidate for backport for other users ? On linux-3.14.y, this would be diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index d971f49..36af881 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -858,17 +858,17 @@ xfs_init_mount_workqueues( goto out_destroy_unwritten; mp->m_reclaim_workqueue = alloc_workqueue("xfs-reclaim/%s", - 0, 0, mp->m_fsname); + WQ_MEM_RECLAIM, 0, mp->m_fsname); if (!mp->m_reclaim_workqueue) goto out_destroy_cil; mp->m_log_workqueue = alloc_workqueue("xfs-log/%s", - 0, 0, mp->m_fsname); + WQ_MEM_RECLAIM, 0, mp->m_fsname); if (!mp->m_log_workqueue) goto out_destroy_reclaim; mp->m_eofblocks_workqueue = alloc_workqueue("xfs-eofblocks/%s", - 0, 0, mp->m_fsname); + WQ_MEM_RECLAIM, 0, mp->m_fsname); if (!mp->m_eofblocks_workqueue) goto out_destroy_log; Regards, -- Jean-Tiare Le Bigot, OVH _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs