On Tue, Feb 16, 2016 at 08:31:20PM -0500, Waiman Long wrote: > When many threads are trying to add or delete inode to or from > a superblock's s_inodes list, spinlock contention on the list can > become a performance bottleneck. > > This patch changes the s_inodes field to become a per-cpu list with > per-cpu spinlocks. > > With an exit microbenchmark that creates a large number of threads, > attachs many inodes to them and then exits. The runtimes of that > microbenchmark with 1000 threads before and after the patch on a > 4-socket Intel E7-4820 v3 system (40 cores, 80 threads) were as > follows: > > Kernel Elapsed Time System Time > ------ ------------ ----------- > Vanilla 4.5-rc4 65.29s 82m14s > Patched 4.5-rc4 22.81s 23m03s Pretty good :) My fsmark tests usually show up a fair bit of contention - moving 250k inodes through the cache every second over 16p does generate a bit of load on the list. The patch makes the inode list add/del operations disappear completely from the perf profiles, and there's a marginal decrease in runtime (~4m40s vs 4m30s). I think the global lock is right on the edge of breakdown under this load, though, so if I was testing on a larger system I think the difference would be much bigger. I'll run some more testing on it, see if anything breaks. A few comments on the code follow. > @@ -1866,8 +1866,8 @@ void iterate_bdevs(void (*func)(struct block_device *, void *), void *arg) > { > struct inode *inode, *old_inode = NULL; > > - spin_lock(&blockdev_superblock->s_inode_list_lock); > - list_for_each_entry(inode, &blockdev_superblock->s_inodes, i_sb_list) { > + for_all_percpu_list_entries_simple(inode, percpu_lock, > + blockdev_superblock->s_inodes_cpu, i_sb_list) { This is kind what I meant about names getting way too long. How about something like: #define walk_sb_inodes(inode, sb, pcpu_lock) \ for_all_percpu_list_entries_simple(inode, pcpu_lock, \ sb->s_inodes_list, i_sb_list) #define walk_sb_inodes_end(pcpu_lock) end_all_percpu_list_entries(pcpu_lock) for brevity? > @@ -189,7 +190,7 @@ void fsnotify_unmount_inodes(struct super_block *sb) > spin_unlock(&inode->i_lock); > > /* In case the dropping of a reference would nuke next_i. */ > - while (&next_i->i_sb_list != &sb->s_inodes) { > + while (&next_i->i_sb_list.list != percpu_head) { > spin_lock(&next_i->i_lock); > if (!(next_i->i_state & (I_FREEING | I_WILL_FREE)) && > atomic_read(&next_i->i_count)) { > @@ -199,16 +200,16 @@ void fsnotify_unmount_inodes(struct super_block *sb) > break; > } > spin_unlock(&next_i->i_lock); > - next_i = list_next_entry(next_i, i_sb_list); > + next_i = list_next_entry(next_i, i_sb_list.list); pcpu_list_next_entry(next_i, i_sb_list)? > @@ -1397,9 +1398,8 @@ struct super_block { > */ > int s_stack_depth; > > - /* s_inode_list_lock protects s_inodes */ > - spinlock_t s_inode_list_lock ____cacheline_aligned_in_smp; > - struct list_head s_inodes; /* all inodes */ > + /* The percpu locks protect s_inodes_cpu */ > + PERCPU_LIST_HEAD(s_inodes_cpu); /* all inodes */ There is no need to encode the type of list into the name. i.e. drop the "_cpu" suffix - we can see it's a percpu list from the declaration. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html