Chuck Lever has presented measurements, that would appear to indicate that the inode->i_lock can be a source of contention when doing writeback using high performance networks. This patch series is therefore an attempt to address the sources of contention that his measurements pointed to. According to Chuck's table, the main sources of contention appear to be in nfs_flush_incompatible(), nfs_lock_and_join_requests() and nfs_updatepage(). Most of this patch series focuses on fixing nfs_lock_and_join_requests(), since it holds the inode lock, while taking locks on the page group and the sub-requests themselves, rolling them all back if ever there is contention. By noting a few simple rules that are mostly already in place, we can simplify that locking to ensure that we only have to keep the spin lock while we're dereferencing and locking the head page pointer that is stored in page_private(page). Along the way, the patches also simplify the code a little, and fix a number of subtle races which mainly occur when you set wsize to some value smaller than the page size. The most notable such race occurs between nfs_lock_and_join_requests() and nfs_page_group_destroy(), and could lead to a double free() of some of the sub-requests. Finally, there are 2 patches tacked onto the end of the series that attempt to improve the throttling of writes when the RPC layer is congested. They do so by forcing each caller of nfs_initiate_pgio() to wait until the request is being transmitted. The expectation is that this might help improve latencies when there are several processes competing for access to the RPC transport. v2: Further patches to eliminate use of inode->i_lock to protect the page->private contents. Protect the (large!) list of unstable pages using a dedicated NFS_I(inode)->commit_mutex instead of the inode->i_lock. Trond Myklebust (28): NFS: Simplify page writeback NFS: Reduce lock contention in nfs_page_find_head_request() NFS: Reduce lock contention in nfs_try_to_update_request() NFS: Ensure we always dereference the page head last NFS: Fix a reference and lock leak in nfs_lock_and_join_requests() NFS: Fix an ABBA issue in nfs_lock_and_join_requests() NFS: Don't check request offset and size without holding a lock NFS: Don't unlock writebacks before declaring PG_WB_END NFS: Fix the inode request accounting when pages have subrequests NFS: Teach nfs_try_to_update_request() to deal with request page_groups NFS: Remove page group limit in nfs_flush_incompatible() NFS: Reduce inode->i_lock contention in nfs_lock_and_join_requests() NFS: Further optimise nfs_lock_and_join_requests() NFS: Fix nfs_page_group_destroy() and nfs_lock_and_join_requests() race cases NFS: Remove nfs_page_group_clear_bits() NFS: Remove unuse function nfs_page_group_lock_wait() NFS: Remove unused parameter from nfs_page_group_lock() NFS: Fix up nfs_page_group_covers_page() NFSv4: Convert nfs_lock_and_join_requests() to use nfs_page_find_head_request() NFS: Refactor nfs_page_find_head_request() NFSv4: Use a mutex to protect the per-inode commit lists NFS: Use an atomic_long_t to count the number of requests NFS: Use an atomic_long_t to count the number of commits NFS: Switch to using mapping->private_lock for page writeback lookups. NFSv4/pnfs: Replace pnfs_put_lseg_locked() with pnfs_put_lseg() NFS: Wait for requests that are locked on the commit list SUNRPC: Add a function to allow waiting for RPC transmission NFS: Throttle I/O to the NFS server fs/nfs/callback_proc.c | 2 +- fs/nfs/delegation.c | 2 +- fs/nfs/direct.c | 4 +- fs/nfs/inode.c | 10 +- fs/nfs/pagelist.c | 70 +++---- fs/nfs/pnfs.c | 41 ---- fs/nfs/pnfs.h | 2 - fs/nfs/pnfs_nfs.c | 37 ++-- fs/nfs/write.c | 440 ++++++++++++++++++++----------------------- include/linux/nfs_fs.h | 5 +- include/linux/nfs_page.h | 3 +- include/linux/nfs_xdr.h | 2 +- include/linux/sunrpc/sched.h | 3 + net/sunrpc/sched.c | 22 +++ net/sunrpc/xprt.c | 6 + 15 files changed, 295 insertions(+), 354 deletions(-) -- 2.13.3 -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html