I found two deadlock problems occur when kswapd writebacks XFS pages. I detected these problems on RHEL kernel actually, and I suppose these also happen on upstream kernel (3.16-rc1). 1. A process (processA) has acquired read semaphore "xfs_cil.xc_ctx_lock" at xfs_log_commit_cil() and it is waiting for the kswapd. Then, a kworker has issued xlog_cil_push_work() and it is waiting for acquiring the write semaphore. kswapd is waiting for acquiring the read semaphore at xfs_log_commit_cil() because the kworker has been waiting before for acquiring the write semaphore at xlog_cil_push(). Therefore, a deadlock happens. The deadlock flow is as follows. processA | kworker | kswapd ----------------------+--------------------------+---------------------- | xfs_trans_commit | | | xfs_log_commit_cil | | | down_read(xc_ctx_lock)| | | xlog_cil_insert_items | | | xlog_cil_insert_format_items | | kmem_alloc | | | : | | | shrink_inactive_list | | | congestion_wait | | | # waiting for kswapd..| | | | xlog_cil_push_work | | | xlog_cil_push | | | xfs_trans_commit | | | down_write(xc_ctx_lock) | | | # waiting for processA...| | | | shrink_page_list | | | xfs_vm_writepage | | | xfs_map_blocks | | | xfs_iomap_write_allocate | | | xfs_trans_commit | | | xfs_log_commit_cil | | | down_read(xc_ctx_lock) V(time) | | # waiting for kworker... ----------------------+--------------------------+----------------------- To fix this, should we up the read semaphore before calling kmem_alloc() at xlog_cil_insert_format_items() to avoid blocking the kworker? Or, should we the second argument of kmem_alloc() from KM_SLEEP|KM_NOFS to KM_NOSLEEP to avoid waiting for the kswapd. Or... 2. A kworker (kworkerA), whish is a writeback thread, is waiting for the XFS allocation thread (kworkerB) while it writebacks XFS pages. kworkerB has started the allocation and it is waiting for kswapd to allocate free pages. kswapd has started writeback XFS pages and it is waiting for more log space. The reason why exhaustion of the log space is both the writeback thread and kswapd are stuck, so some processes, who have allocated the log space and are requesting free pages, are also stuck. The deadlock flow is as follows. kworkerA | kworkerB | kswapd ----------------------+--------------------------+----------------------- | wb_writeback | | | : | | | xfs_vm_writepage | | | xfs_map_blocks | | | xfs_iomap_write_allocate | | xfs_bmapi_write | | | xfs_bmapi_allocate | | | wait_for_completion | | | # waiting for kworkerB... | | | xfs_bmapi_allocate_worker| | | : | | | xfs_buf_get_map | | | xfs_buf_allocate_memory | | | alloc_pages_current | | | : | | | shrink_inactive_list | | | congestion_wait | | | # waiting for kswapd... | | | | shrink_page_list | | | xfs_vm_writepage | | | : | | | xfs_log_reserve | | | : | | | xlog_grant_head_check | | | xlog_grant_head_wait | | | # waiting for more | | | # space... V(time) | | ----------------------+--------------------------+----------------------- I don't have any ideas to fix this... Thanks, Masayoshi Mizuma -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>