Re: [RFC PATCH 1/2] lib/percpu-list: Per-cpu list with associated per-cpu locks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02/17/2016 01:22 PM, Peter Zijlstra wrote:
On Wed, Feb 17, 2016 at 12:41:53PM -0500, Waiman Long wrote:
On 02/17/2016 12:18 PM, Peter Zijlstra wrote:
On Wed, Feb 17, 2016 at 12:12:57PM -0500, Waiman Long wrote:
On 02/17/2016 11:27 AM, Christoph Lameter wrote:
On Wed, 17 Feb 2016, Waiman Long wrote:

I know we can use RCU for singly linked list, but I don't think we can use
that for doubly linked list as there is no easy way to make atomic changes to
both prev and next pointers simultaneously unless you are taking about 16b
cmpxchg which is only supported in some architecture.
But its supported in the most important architecutes. You can fall back to
spinlocks on the ones that do not support it.

I guess with some limitations on how the lists can be traversed, we may be
able to do that with RCU without lock. However, that will make the code more
complex and harder to verify. Given that in both my and Dave's testing that
contentions with list insertion and deletion are almost gone from the perf
profile when they used to be a bottleneck, is it really worth the effort to
do such a conversion?
My initial concern was the preempt disable delay introduced by holding
the spinlock over the entire iteration.

There is no saying how many elements are on that list and there is no
lock break.
But preempt_disable() is called at the beginning of the spin_lock() call. So
the additional preempt_disable() in percpu_list_add() is just to cover the
this_cpu_ptr() call to make sure that the cpu number doesn't change. So we
are talking about a few ns at most here.

I'm talking about the list iteration, there is no preempt_disable() in
there, just the spin_lock, which you hold over the entire list, which
can be many, many element.

Sorry for the misunderstanding.

The original code has one global lock and one single list that covers all the inodes in the filesystem. This patch essentially breaks it up into multiple smaller lists with one lock for each. So the lock hold time should have been greatly reduced unless we are unfortunately enough that most of the inodes are in one single list.

If lock hold time is a concern, I think in some cases we can set the an upper limit on how many inodes we want to process, release the lock, reacquire it and continue. I am just worry that using RCU and 16b cmpxchg will introduce too much complexity with no performance gain to show.

Cheers,
Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux