Re: [RFC PATCH 1/2] lib/percpu-list: Per-cpu list with associated per-cpu locks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 16, 2016 at 08:31:19PM -0500, Waiman Long wrote:
> Linked list is used everywhere in the Linux kernel. However, if many
> threads are trying to add or delete entries into the same linked list,
> it can create a performance bottleneck.
> 
> This patch introduces a new per-cpu list subystem with associated
> per-cpu locks for protecting each of the lists individually. This
> allows list entries insertion and deletion operations to happen in
> parallel instead of being serialized with a global list and lock.
> 
> List entry insertion is strictly per cpu. List deletion, however, can
> happen in a cpu other than the one that did the insertion. So we still
> need lock to protect the list. Because of that, there may still be
> a small amount of contention when deletion is being done.
> 
> A new header file include/linux/percpu-list.h will be added with the
> associated percpu_list structure. The following functions are used
> to manage the per-cpu list:
> 
>  1. int init_percpu_list_head(struct percpu_list **pclist_handle)
>  2. void percpu_list_add(struct percpu_list *new,
> 			 struct percpu_list *head)
>  3. void percpu_list_del(struct percpu_list *entry)

A few comments on the code

> + * A per-cpu list protected by a per-cpu spinlock.
> + *
> + * The list head percpu_list structure contains the spinlock, the other
> + * entries in the list contain the spinlock pointer.
> + */
> +struct percpu_list {
> +	struct list_head list;
> +	union {
> +		spinlock_t lock;	/* For list head */
> +		spinlock_t *lockptr;	/* For other entries */
> +	};
> +};

This union is bad for kernels running spinlock debugging - the size
of the spinlock can blow out, and that increases the size of any
object that has a percpu_list in it. I've only got basic spinlock
debugging turned on, and the spinlock_t is 24 bytes with that
config. If I turn on lockdep, it gets much larger again....

So it might be best to separate the list head and list entry
structures, similar to a hash list?

> +static inline void INIT_PERCPU_LIST_HEAD(struct percpu_list *pcpu_list)
> +{
> +	INIT_LIST_HEAD(&pcpu_list->list);
> +	pcpu_list->lock = __SPIN_LOCK_UNLOCKED(&pcpu_list->lock);
> +}
> +
> +static inline void INIT_PERCPU_LIST_ENTRY(struct percpu_list *pcpu_list)
> +{
> +	INIT_LIST_HEAD(&pcpu_list->list);
> +	pcpu_list->lockptr = NULL;
> +}

These function names don't need to shout.

> +/**
> + * for_all_percpu_list_entries - iterate over all the per-cpu list with locking
> + * @pos:	the type * to use as a loop cursor for the current .
> + * @next:	an internal type * variable pointing to the next entry
> + * @pchead:	an internal struct list * of percpu list head
> + * @pclock:	an internal variable for the current per-cpu spinlock
> + * @head:	the head of the per-cpu list
> + * @member:	the name of the per-cpu list within the struct
> + */
> +#define for_all_percpu_list_entries(pos, next, pchead, pclock, head, member)\
> +	{								 \
> +	int cpu;							 \
> +	for_each_possible_cpu (cpu) {					 \
> +		typeof(*pos) *next;					 \
> +		spinlock_t *pclock = per_cpu_ptr(&(head)->lock, cpu);	 \
> +		struct list_head *pchead = &per_cpu_ptr(head, cpu)->list;\
> +		spin_lock(pclock);					 \
> +		list_for_each_entry_safe(pos, next, pchead, member.list)
> +
> +#define end_all_percpu_list_entries(pclock)	spin_unlock(pclock); } }

This is a bit of a landmine - the code inside he iteration is under
a spinlock hidden in the macros. People are going to forget about
that, and it's needs documenting about how it needs to be treated
w.r.t. dropping and regaining the lock so sleeping operations can be
performed on objects on the list being iterated.

Can we also think up some shorter names - names that are 30-40
characters long are getting out out of hand given we're supposed
tobe sticking to 80 character widths and we lost 8 of them in the
first indent...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux