On Wed 24-02-16 09:36:30, Ingo Molnar wrote: > > * Jan Kara <jack@xxxxxxx> wrote: > > > On Tue 23-02-16 14:04:32, Waiman Long wrote: > > > When many threads are trying to add or delete inode to or from > > > a superblock's s_inodes list, spinlock contention on the list can > > > become a performance bottleneck. > > > > > > This patch changes the s_inodes field to become a per-cpu list with > > > per-cpu spinlocks. As a result, the following superblock inode list > > > (sb->s_inodes) iteration functions in vfs are also being modified: > > > > > > 1. iterate_bdevs() > > > 2. drop_pagecache_sb() > > > 3. wait_sb_inodes() > > > 4. evict_inodes() > > > 5. invalidate_inodes() > > > 6. fsnotify_unmount_inodes() > > > 7. add_dquot_ref() > > > 8. remove_dquot_ref() > > > > > > With an exit microbenchmark that creates a large number of threads, > > > attachs many inodes to them and then exits. The runtimes of that > > > microbenchmark with 1000 threads before and after the patch on a > > > 4-socket Intel E7-4820 v3 system (40 cores, 80 threads) were as > > > follows: > > > > > > Kernel Elapsed Time System Time > > > ------ ------------ ----------- > > > Vanilla 4.5-rc4 65.29s 82m14s > > > Patched 4.5-rc4 22.81s 23m03s > > > > > > Before the patch, spinlock contention at the inode_sb_list_add() > > > function at the startup phase and the inode_sb_list_del() function at > > > the exit phase were about 79% and 93% of total CPU time respectively > > > (as measured by perf). After the patch, the percpu_list_add() > > > function consumed only about 0.04% of CPU time at startup phase. The > > > percpu_list_del() function consumed about 0.4% of CPU time at exit > > > phase. There were still some spinlock contention, but they happened > > > elsewhere. > > > > While looking through this patch, I have noticed that the > > list_for_each_entry_safe() iterations in evict_inodes() and > > invalidate_inodes() are actually unnecessary. So if you first apply the > > attached patch, you don't have to implement safe iteration variants at all. > > > > As a second comment, I'd note that this patch grows struct inode by 1 pointer. > > It is probably acceptable for large machines given the speedup but it should be > > noted in the changelog. Furthermore for UP or even small SMP systems this is > > IMHO undesired bloat since the speedup won't be noticeable. > > > > So for these small systems it would be good if per-cpu list magic would just > > fall back to single linked list with a spinlock. Do you think that is reasonably > > doable? > > Even many 'small' systems tend to be SMP these days. Yes, I know. But my tablet with 4 ARM cores is unlikely to benefit from this change either. It will just have to pay the memory cost. And frankly I don't care that much myself but if there is some reasonably easy way to avoid the cost, it would be welcome. > If you do this then please keep it a separate add-on patch, so that the > 'UP cost' becomes apparent. Uniprocessor #ifdeffery is really painful in > places and we might be better off with a single extra pointer. > Forthermore UP kernels are tested a lot less stringently than SMP > kernels. It's just 4 bytes for a truly small 32-bit system. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html