* Jan Kara <jack@xxxxxxx> wrote: > On Tue 23-02-16 14:04:32, Waiman Long wrote: > > When many threads are trying to add or delete inode to or from > > a superblock's s_inodes list, spinlock contention on the list can > > become a performance bottleneck. > > > > This patch changes the s_inodes field to become a per-cpu list with > > per-cpu spinlocks. As a result, the following superblock inode list > > (sb->s_inodes) iteration functions in vfs are also being modified: > > > > 1. iterate_bdevs() > > 2. drop_pagecache_sb() > > 3. wait_sb_inodes() > > 4. evict_inodes() > > 5. invalidate_inodes() > > 6. fsnotify_unmount_inodes() > > 7. add_dquot_ref() > > 8. remove_dquot_ref() > > > > With an exit microbenchmark that creates a large number of threads, > > attachs many inodes to them and then exits. The runtimes of that > > microbenchmark with 1000 threads before and after the patch on a > > 4-socket Intel E7-4820 v3 system (40 cores, 80 threads) were as > > follows: > > > > Kernel Elapsed Time System Time > > ------ ------------ ----------- > > Vanilla 4.5-rc4 65.29s 82m14s > > Patched 4.5-rc4 22.81s 23m03s > > > > Before the patch, spinlock contention at the inode_sb_list_add() > > function at the startup phase and the inode_sb_list_del() function at > > the exit phase were about 79% and 93% of total CPU time respectively > > (as measured by perf). After the patch, the percpu_list_add() > > function consumed only about 0.04% of CPU time at startup phase. The > > percpu_list_del() function consumed about 0.4% of CPU time at exit > > phase. There were still some spinlock contention, but they happened > > elsewhere. > > While looking through this patch, I have noticed that the > list_for_each_entry_safe() iterations in evict_inodes() and > invalidate_inodes() are actually unnecessary. So if you first apply the > attached patch, you don't have to implement safe iteration variants at all. > > As a second comment, I'd note that this patch grows struct inode by 1 pointer. > It is probably acceptable for large machines given the speedup but it should be > noted in the changelog. Furthermore for UP or even small SMP systems this is > IMHO undesired bloat since the speedup won't be noticeable. > > So for these small systems it would be good if per-cpu list magic would just > fall back to single linked list with a spinlock. Do you think that is reasonably > doable? Even many 'small' systems tend to be SMP these days. If you do this then please keep it a separate add-on patch, so that the 'UP cost' becomes apparent. Uniprocessor #ifdeffery is really painful in places and we might be better off with a single extra pointer. Forthermore UP kernels are tested a lot less stringently than SMP kernels. It's just 4 bytes for a truly small 32-bit system. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html