Re: [PATCH v3 3/3] vfs: Use per-cpu list for superblock's inode list

Waiman Long <waiman.long@xxxxxxx> · Thu, 25 Feb 2016 09:50:10 -0500

On 02/24/2016 03:23 PM, Waiman Long wrote:
On 02/24/2016 03:28 AM, Jan Kara wrote:
On Tue 23-02-16 14:04:32, Waiman Long wrote:
When many threads are trying to add or delete inode to or from
a superblock's s_inodes list, spinlock contention on the list can
become a performance bottleneck.

This patch changes the s_inodes field to become a per-cpu list with
per-cpu spinlocks. As a result, the following superblock inode list
(sb->s_inodes) iteration functions in vfs are also being modified:

  1. iterate_bdevs()
  2. drop_pagecache_sb()
  3. wait_sb_inodes()
  4. evict_inodes()
  5. invalidate_inodes()
  6. fsnotify_unmount_inodes()
  7. add_dquot_ref()
  8. remove_dquot_ref()

With an exit microbenchmark that creates a large number of threads,
attachs many inodes to them and then exits. The runtimes of that
microbenchmark with 1000 threads before and after the patch on a
4-socket Intel E7-4820 v3 system (40 cores, 80 threads) were as
follows:

   Kernel            Elapsed Time    System Time
   ------            ------------    -----------
   Vanilla 4.5-rc4      65.29s         82m14s
   Patched 4.5-rc4      22.81s         23m03s

Before the patch, spinlock contention at the inode_sb_list_add()
function at the startup phase and the inode_sb_list_del() function at
the exit phase were about 79% and 93% of total CPU time respectively
(as measured by perf). After the patch, the percpu_list_add()
function consumed only about 0.04% of CPU time at startup phase. The
percpu_list_del() function consumed about 0.4% of CPU time at exit
phase. There were still some spinlock contention, but they happened
elsewhere.
While looking through this patch, I have noticed that the
list_for_each_entry_safe() iterations in evict_inodes() and
invalidate_inodes() are actually unnecessary. So if you first apply the
attached patch, you don't have to implement safe iteration variants 
at all.

Thank for the patch. I will apply that in my next update. As for the 
safe iteration variant, I think I will keep it since I had implemented 
that already just in case it may be needed in some other places.

As a second comment, I'd note that this patch grows struct inode by 1
pointer. It is probably acceptable for large machines given the 
speedup but
it should be noted in the changelog. Furthermore for UP or even small 
SMP
systems this is IMHO undesired bloat since the speedup won't be 
noticeable.

So for these small systems it would be good if per-cpu list magic 
would just
fall back to single linked list with a spinlock. Do you think that is
reasonably doable?

I already have a somewhat separate code path for UP. So I can remove 
the lock pointer for that. For small SMP system, however, the only way 
to avoid the extra pointer is to add a config parameter to turn this 
feature off. That can be added as a separate patch, if necessary.

I am sorry that I need to retreat from this promise for UP. Removing the 
lock pointer will require change in the list deletion API to pass in the 
lock information. So I am not going to change it for the time being.

Cheers,
Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html