v5->v6: - Rebased the patch to 4.14-rc3. - Drop the fsnotify patch as it had been merged somehow. - Add a new patch 5 with alternative way of selecting list by hashing instead of cpu #. - Add a new patch 6 to proivde a set irq safe APIs to be used in interrupt context. - Update the CPU to index mapping code. v4->v5: - Rebased the patch to 4.8-rc1 (changes to fs/fs-writeback.c was dropped). - Use kcalloc() instead of percpu_alloc() to allocate the dlock list heads structure as suggested by Christoph Lameter. - Replaced patch 5 by another one that made sibling CPUs use the same dlock list head thus reducing the number of list heads that needed to be maintained. v3->v4: - As suggested by Al, encapsulate the dlock list mechanism into the dlist_for_each_entry() and dlist_for_each_entry_safe() which are the equivalent of list_for_each_entry() and list_for_each_entry_safe() for regular linked list. That simplifies the changes in the call sites that perform dlock list iterations. - Add a new patch to make the percpu head structure cacheline aligned to prevent cacheline contention from disrupting the performance of nearby percpu variables. v2->v3: - Remove the 2 persubnode API patches. - Merge __percpu tag patch 2 into patch 1. - As suggested by Tejun Heo, restructure the dlock_list_head data structure to hide the __percpu tag and rename some of the functions and structures. - Move most of the code from dlock_list.h to dlock_list.c and export the symbols. v1->v2: - Add a set of simple per-subnode APIs that is between percpu and per-node in granularity. - Make dlock list to use the per-subnode APIs so as to reduce the total number of separate linked list that needs to be managed and iterated. - There is no change in patches 1-5. This is a follow up of the following patchset: [PATCH v7 0/4] vfs: Use per-cpu list for SB's s_inodes list https://lkml.org/lkml/2016/4/12/1009 This patchset provides new APIs for a set of distributed locked lists (one/CPU core) to minimize lock and cacheline contention. Insertion and deletion to the list will be cheap and relatively contention free. Lookup, on the other hand, may be a bit more costly as there are multiple lists to iterate. This is not really a problem for the replacement of superblock's inode list by dlock list included in the patchset as lookup isn't needed. For use cases that need to do lookup, the dlock list can also be treated as a set of hashed lists that scales with the number of CPU cores in the system. Patch 1 introduces the dlock list. The list heads are allocated by kcalloc() instead of percpu_alloc(). Each list head entry is cacheline aligned to minimize contention. Patch 2 replaces the use of list_for_each_entry_safe() in evict_inodes() and invalidate_inodes() by list_for_each_entry(). Patch 3 modifies the superblock and inode structures to use the dlock list. The corresponding functions that reference those structures are modified. Patch 4 makes the sibling CPUs use the same dlock list head to reduce the number of list heads that need to be iterated. Patch 5 enables alternative use case of as a set of hashed lists. Patch 6 provides irq safe APIs to be used in interrupt context. Jan Kara (1): vfs: Remove unnecessary list_for_each_entry_safe() variants Waiman Long (5): lib/dlock-list: Distributed and lock-protected lists vfs: Use dlock list for superblock's inode list lib/dlock-list: Make sibling CPUs share the same linked list lib/dlock-list: Enable faster lookup with hashing lib/dlock-list: Provide IRQ-safe APIs fs/block_dev.c | 9 +- fs/drop_caches.c | 9 +- fs/inode.c | 38 ++--- fs/notify/fsnotify.c | 9 +- fs/quota/dquot.c | 14 +- fs/super.c | 7 +- include/linux/dlock-list.h | 297 ++++++++++++++++++++++++++++++++++++ include/linux/fs.h | 8 +- lib/Makefile | 2 +- lib/dlock-list.c | 366 +++++++++++++++++++++++++++++++++++++++++++++ 10 files changed, 705 insertions(+), 54 deletions(-) create mode 100644 include/linux/dlock-list.h create mode 100644 lib/dlock-list.c -- 1.8.3.1