v4->v5: - Rebased the patch to 4.8-rc1 (changes to fs/fs-writeback.c was dropped). - Use kcalloc() instead of percpu_alloc() to allocate the dlock list heads structure as suggested by Christoph Lameter. - Replaced patch 5 by another one that made sibling CPUs use the same dlock list head thus reducing the number of list heads that needed to be maintained. v3->v4: - As suggested by Al, encapsulate the dlock list mechanism into the dlist_for_each_entry() and dlist_for_each_entry_safe() which are the equivalent of list_for_each_entry() and list_for_each_entry_safe() for regular linked list. That simplifies the changes in the call sites that perform dlock list iterations. - Add a new patch to make the percpu head structure cacheline aligned to prevent cacheline contention from disrupting the performance of nearby percpu variables. v2->v3: - Remove the 2 persubnode API patches. - Merge __percpu tag patch 2 into patch 1. - As suggested by Tejun Heo, restructure the dlock_list_head data structure to hide the __percpu tag and rename some of the functions and structures. - Move most of the code from dlock_list.h to dlock_list.c and export the symbols. v1->v2: - Add a set of simple per-subnode APIs that is between percpu and per-node in granularity. - Make dlock list to use the per-subnode APIs so as to reduce the total number of separate linked list that needs to be managed and iterated. - There is no change in patches 1-5. This is a follow up of the following patchset: [PATCH v7 0/4] vfs: Use per-cpu list for SB's s_inodes list https://lkml.org/lkml/2016/4/12/1009 Patch 1 introduces the dlock list. The list heads are allocated by kcalloc() instead of percpu_alloc(). This may slightly increase cacheline contention when multiple CPUs are accessing dlock list, but improve performance when the whole dlock list needs to be iterated. Patch 2 cleans up the fsnotify_unmount_inodes() function by making the code simpler and more standard. Patch 3 replaces the use of list_for_each_entry_safe() in evict_inodes() and invalidate_inodes() by list_for_each_entry(). Patch 4 modifies the superblock and inode structures to use the dlock list. The corresponding functions that reference those structures are modified. Patch 5 makes the sibling CPUs use the same dlock list head to reduce the number of list heads that need to be iterated. Jan Kara (2): fsnotify: Simplify inode iteration on umount vfs: Remove unnecessary list_for_each_entry_safe() variants Waiman Long (3): lib/dlock-list: Distributed and lock-protected lists vfs: Use dlock list for superblock's inode list lib/dlock-list: Make sibling CPUs share the same linked list fs/block_dev.c | 9 +- fs/drop_caches.c | 9 +- fs/inode.c | 38 +++---- fs/notify/inode_mark.c | 52 ++------- fs/quota/dquot.c | 14 +-- fs/super.c | 7 +- include/linux/dlock-list.h | 230 +++++++++++++++++++++++++++++++++++++ include/linux/fs.h | 8 +- lib/Makefile | 2 +- lib/dlock-list.c | 268 ++++++++++++++++++++++++++++++++++++++++++++ 10 files changed, 548 insertions(+), 89 deletions(-) create mode 100644 include/linux/dlock-list.h create mode 100644 lib/dlock-list.c -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html