On 02/24/2016 03:23 PM, Waiman Long wrote:
On 02/24/2016 03:28 AM, Jan Kara wrote:
On Tue 23-02-16 14:04:32, Waiman Long wrote:
When many threads are trying to add or delete inode to or from
a superblock's s_inodes list, spinlock contention on the list can
become a performance bottleneck.
This patch changes the s_inodes field to become a per-cpu list with
per-cpu spinlocks. As a result, the following superblock inode list
(sb->s_inodes) iteration functions in vfs are also being modified:
1. iterate_bdevs()
2. drop_pagecache_sb()
3. wait_sb_inodes()
4. evict_inodes()
5. invalidate_inodes()
6. fsnotify_unmount_inodes()
7. add_dquot_ref()
8. remove_dquot_ref()
With an exit microbenchmark that creates a large number of threads,
attachs many inodes to them and then exits. The runtimes of that
microbenchmark with 1000 threads before and after the patch on a
4-socket Intel E7-4820 v3 system (40 cores, 80 threads) were as
follows:
Kernel Elapsed Time System Time
------ ------------ -----------
Vanilla 4.5-rc4 65.29s 82m14s
Patched 4.5-rc4 22.81s 23m03s
Before the patch, spinlock contention at the inode_sb_list_add()
function at the startup phase and the inode_sb_list_del() function at
the exit phase were about 79% and 93% of total CPU time respectively
(as measured by perf). After the patch, the percpu_list_add()
function consumed only about 0.04% of CPU time at startup phase. The
percpu_list_del() function consumed about 0.4% of CPU time at exit
phase. There were still some spinlock contention, but they happened
elsewhere.
While looking through this patch, I have noticed that the
list_for_each_entry_safe() iterations in evict_inodes() and
invalidate_inodes() are actually unnecessary. So if you first apply the
attached patch, you don't have to implement safe iteration variants
at all.
Thank for the patch. I will apply that in my next update. As for the
safe iteration variant, I think I will keep it since I had implemented
that already just in case it may be needed in some other places.
As a second comment, I'd note that this patch grows struct inode by 1
pointer. It is probably acceptable for large machines given the
speedup but
it should be noted in the changelog. Furthermore for UP or even small
SMP
systems this is IMHO undesired bloat since the speedup won't be
noticeable.
So for these small systems it would be good if per-cpu list magic
would just
fall back to single linked list with a spinlock. Do you think that is
reasonably doable?
I already have a somewhat separate code path for UP. So I can remove
the lock pointer for that. For small SMP system, however, the only way
to avoid the extra pointer is to add a config parameter to turn this
feature off. That can be added as a separate patch, if necessary.
I am sorry that I need to retreat from this promise for UP. Removing the
lock pointer will require change in the list deletion API to pass in the
lock information. So I am not going to change it for the time being.
Cheers,
Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html