Re: [PATCH v2 1/7] lib/dlock-list: Distributed and lock-protected lists

Waiman Long <waiman.long@xxxxxxx> · Wed, 13 Jul 2016 22:54:19 -0400

On 07/13/2016 12:08 PM, Tejun Heo wrote:
Hello,

On Mon, Jul 11, 2016 at 01:32:06PM -0400, Waiman Long wrote:
...
A new header file include/linux/dlock-list.h will be added with the
Heh, I think perpcu_list was the better name but suppose I'm too late.

Given the fact that we may change the level of lock granularity in the 
future, it is too late to change it back to percpu_list.

associated dlock_list_head and dlock_list_node structures. The following
functions are provided to manage the per-cpu list:

  1. int init_dlock_list_head(struct dlock_list_head **pdlock_head)
  2. void dlock_list_add(struct dlock_list_node *node,
		        struct dlock_list_head *head)
  3. void dlock_list_del(struct dlock_list *node)

Iteration of all the list entries within a group of per-cpu
lists is done by calling either the dlock_list_iterate() or
dlock_list_iterate_safe() functions in a while loop. They correspond
to the list_for_each_entry() and list_for_each_entry_safe() macros
respectively. The iteration states are keep in a dlock_list_state
structure that is passed to the iteration functions.
Why do we need two variants of this?  We need a state variable to walk
the list anyway.  Why not make dlock_list_iterate() safe against
removal and get rid of the second variant?  Also, dlock_list_next()
probably is a better name.

The two variants are the equivalent of list_for_each_entry() and 
list_for_each_entry_safe(). There are scenarios where 
list_for_each_entry_safe() isn't really safe when the next pointer of 
the current node is modified, for example. In this particular use case, 
the safe version isn't needed as all the list_for_each_entry_safe() call 
sites were modified to use list_for_each_entry() instead. I keep the 
dlock_list_iterate_safe() for future use cases that may need the safe 
version.

As it is a macro, it won't consume any resource if it is not used. 
However, I can remove it if you think we shouldn't carry any code that 
is not currently used.

Yes, I can change the name to dlock_list_next().

+/*
+ * include/linux/dlock-list.h
+ *
+ * A distributed (per-cpu) set of lists each of which is protected by its
+ * own spinlock, but acts like a single consolidated list to the callers.
+ *
+ * The dlock_list_head structure contains the spinlock, the other
+ * dlock_list_node structures only contains a pointer to the spinlock in
+ * dlock_list_head.
+ */
+struct dlock_list_head {
+	struct list_head list;
+	spinlock_t lock;
+};
+
+#define DLOCK_LIST_HEAD_INIT(name)				\
+	{							\
+		.list.prev =&name.list,			\
+		.list.next =&name.list,			\
+		.list.lock = __SPIN_LOCK_UNLOCKED(name),	\
+	}
This is confusing.  It looks like dlock_list_head and
DLOCK_LIST_HEAD_INIT() can be used to define and initialize static
dlock_lists but that isn't true.  It's weird to require the user to
deal with percpu declaration of the data type.  Shouldn't it be more
like the following?

struct dlock_list_head_cpu {
	struct list_head list;
	spinlock_t lock;
};

struct dlock_list_head {
	struct dlock_list_head_percpu *head_cpu;
};

I think I know what you want. I will update the code accordingly.

+/*
+ * Per-cpu list iteration state
+ */
+struct dlock_list_state {
+	int			 cpu;
+	spinlock_t		*lock;
+	struct list_head	*head;	/* List head of current per-cpu list */
+	struct dlock_list_node	*curr;
+	struct dlock_list_node	*next;
+};
Maybe dlock_list_iter[ator] is a better name?

Sure, I can rename it.

+#define DLOCK_LIST_STATE_INIT()			\
+	{					\
+		.cpu  = -1,			\
+		.lock = NULL,			\
+		.head = NULL,			\
+		.curr = NULL,			\
+		.next = NULL,			\
+	}
The NULL inits are unnecessary and prone to being left behind.

Will remove those NULL inits.

+#define DEFINE_DLOCK_LIST_STATE(s)		\
+	struct dlock_list_state s = DLOCK_LIST_STATE_INIT()
+
+static inline void init_dlock_list_state(struct dlock_list_state *state)
+{
+	state->cpu  = -1;
+	state->lock = NULL;
+	state->head = NULL;
+	state->curr = NULL;
+	state->next = NULL;
+}
Why not "state = (struct dlock_list_state)DLOCK_LIST_STATE_INIT;"?

Yes, that should work too.

+#ifdef CONFIG_DEBUG_SPINLOCK
+#define DLOCK_LIST_WARN_ON(x)	WARN_ON(x)
+#else
+#define DLOCK_LIST_WARN_ON(x)
+#endif
I'd just use WARN_ON_ONCE() without the CONFIG guard.

I'd do so if it is used in a real function. However, it is used in the 
dlock_list_iterate() macro which will duplicate the WARN_ON call on all 
the call sites. That is why I am hesitant to do that.

+/*
+ * Next per-cpu list entry
+ */
+#define dlock_list_next_entry(pos, member) list_next_entry(pos, member.list)
Why does this need to be exposed?

It was used in one of the modified file. I can change it to an inline 
function. Or do you mean making it a real function and put it in the .c 
file?

+/*
+ * Per-cpu node data structure
+ */
+struct dlock_list_node {
+	struct list_head list;
+	spinlock_t *lockptr;
+};
+
+#define DLOCK_LIST_NODE_INIT(name)		\
+	{					\
+		.list.prev =&name.list,	\
+		.list.next =&name.list,	\
+		.list.lockptr = NULL		\
+	}
Ditto with NULL init.

Will remove it.

+static inline void init_dlock_list_node(struct dlock_list_node *node)
+{
+	INIT_LIST_HEAD(&node->list);
+	node->lockptr = NULL;
+}
Ditto with init.

+static inline void free_dlock_list_head(struct dlock_list_head **pdlock_head)
+{
+	free_percpu(*pdlock_head);
+	*pdlock_head = NULL;
+}
Why does this need to be inlined?

It was inlined because it is very simple. I can move it to the c file to 
have more data abstraction.

+/*
+ * Check if all the per-cpu lists are empty
+ */
Please use proper function comments.

Will do.

+static inline bool dlock_list_empty(struct dlock_list_head *dlock_head)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu)
+		if (!list_empty(&per_cpu_ptr(dlock_head, cpu)->list))
+			return false;
+	return true;
+}
+
+/*
+ * Helper function to find the first entry of the next per-cpu list
+ * It works somewhat like for_each_possible_cpu(cpu).
+ *
+ * Return: true if the entry is found, false if all the lists exhausted
Ditto about the comment.

+ */
+static __always_inline bool
+__dlock_list_next_cpu(struct dlock_list_head *head,
+		      struct dlock_list_state *state)
+{
...
+static inline bool dlock_list_iterate_safe(struct dlock_list_head *head,
+					   struct dlock_list_state *state)
+{
Inlining these doesn't make senes to me.

They are inlined for performance reason. I can make them real functions 
if you think it is better.

diff --git a/lib/Makefile b/lib/Makefile
index 499fb35..92e8c38 100644
--- a/lib/Makefile
+++ b/lib/Makefile
+/*
+ * The dlock list lock needs its own class to avoid warning and stack
+ * trace when lockdep is enabled.
+ */
Can you please elaborate on this?

Kernel warning was given out and I think something got disabled without 
that when the kernel booted up with lockdep enabled. I think it is 
because the locks were dynamically allocated. It will not be a problem 
if they are statically allocated.

+static struct lock_class_key dlock_list_key;
+
+/*
+ * Initialize the per-cpu list head
+ */
+int init_dlock_list_head(struct dlock_list_head **pdlock_head)
+{
+	struct dlock_list_head *dlock_head;
+	int cpu;
+
+	dlock_head = alloc_percpu(struct dlock_list_head);
+	if (!dlock_head)
+		return -ENOMEM;
+
+	for_each_possible_cpu(cpu) {
+		struct dlock_list_head *head = per_cpu_ptr(dlock_head, cpu);
+
+		INIT_LIST_HEAD(&head->list);
+		head->lock = __SPIN_LOCK_UNLOCKED(&head->lock);
+		lockdep_set_class(&head->lock,&dlock_list_key);
+	}
+
+	*pdlock_head = dlock_head;
+	return 0;
+}
Why is this called init?  Why not do the following?

struct dlock_list_head *alloc_dlock_list_head(void);

Also, the pointer type needs to include __percpu annotation.

Thanks for the suggestion. Will change it.

+/*
+ * List selection is based on the CPU being used when the dlock_list_add()
+ * function is called. However, deletion may be done by a different CPU.
+ * So we still need to use a lock to protect the content of the list.
+ */
+void dlock_list_add(struct dlock_list_node *node, struct dlock_list_head *head)
+{
+	struct dlock_list_head *myhead;
+
+	/*
+	 * Disable preemption to make sure that CPU won't gets changed.
+	 */
+	myhead = get_cpu_ptr(head);
+	spin_lock(&myhead->lock);
+	node->lockptr =&myhead->lock;
+	list_add(&node->list,&myhead->list);
+	spin_unlock(&myhead->lock);
+	put_cpu_ptr(head);
+}
I wonder whether it'd be better to use irqsafe operations.  lists tend
to be often used from irq contexts.

The current use case only need to use the regular lock functions. You 
are right that future use cases may require an irqsafe version of locks. 
I can either modify the code now to allow lock type selection at init 
time, for example, or defer it as a future enhancement when the need 
arises. What do you think?

+/*
+ * Delete a node from a dlock list
+ *
+ * We need to check the lock pointer again after taking the lock to guard
+ * against concurrent delete of the same node. If the lock pointer changes
+ * (becomes NULL or to a different one), we assume that the deletion was done
+ * elsewhere.
+ */
+void dlock_list_del(struct dlock_list_node *node)
+{
+	spinlock_t *lock = READ_ONCE(node->lockptr);
+
+	if (unlikely(!lock)) {
+		WARN(1, "dlock_list_del: node 0x%lx has no associated lock\n",
+			(unsigned long)node);
+		return;
+	}
+
+	spin_lock(lock);
+	if (likely(lock == node->lockptr)) {
+		list_del_init(&node->list);
+		node->lockptr = NULL;
+	} else {
+		/*
+		 * This path should never be executed.
+		 */
What if it races against someone else removing and adding back?
Shouldn't it retry on those cases?

If it is racing with another thread removing and adding back, retrying 
the deletion may not be the right thing to do. It really depends on what 
kind of deletion the callers are doing. I will clarify in the comment 
further change may be needed in the future if future callers are doing 
complicated deletion that requires more elaborate exception handling.

Thanks for the review comments.

Cheers,
Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html