Re: [RFC PATCH v2 3/4] hp: Implement Hazard Pointers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2024-10-05 18:04, Peter Zijlstra wrote:
On Fri, Oct 04, 2024 at 02:27:33PM -0400, Mathieu Desnoyers wrote:
  include/linux/hp.h | 158 +++++++++++++++++++++++++++++++++++++++++++++
  kernel/Makefile    |   2 +-
  kernel/hp.c        |  46 +++++++++++++
  3 files changed, 205 insertions(+), 1 deletion(-)
  create mode 100644 include/linux/hp.h
  create mode 100644 kernel/hp.c

diff --git a/include/linux/hp.h b/include/linux/hp.h
new file mode 100644
index 000000000000..e85fc4365ea2
--- /dev/null
+++ b/include/linux/hp.h
@@ -0,0 +1,158 @@
+// SPDX-FileCopyrightText: 2024 Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
+//
+// SPDX-License-Identifier: LGPL-2.1-or-later
+
+#ifndef _LINUX_HP_H
+#define _LINUX_HP_H
+
+/*
+ * HP: Hazard Pointers
+ *
+ * This API provides existence guarantees of objects through hazard
+ * pointers.
+ *
+ * It uses a fixed number of hazard pointer slots (nr_cpus) across the
+ * entire system for each HP domain.
+ *
+ * Its main benefit over RCU is that it allows fast reclaim of
+ * HP-protected pointers without needing to wait for a grace period.
+ *
+ * It also allows the hazard pointer scan to call a user-defined callback
+ * to retire a hazard pointer slot immediately if needed. This callback
+ * may, for instance, issue an IPI to the relevant CPU.
+ *
+ * References:
+ *
+ * [1]: M. M. Michael, "Hazard pointers: safe memory reclamation for
+ *      lock-free objects," in IEEE Transactions on Parallel and
+ *      Distributed Systems, vol. 15, no. 6, pp. 491-504, June 2004
+ */
+
+#include <linux/rcupdate.h>
+
+/*
+ * Hazard pointer slot.
+ */
+struct hp_slot {
+	void *addr;
+};
+
+/*
+ * Hazard pointer context, returned by hp_use().
+ */
+struct hp_ctx {
+	struct hp_slot *slot;
+	void *addr;
+};
+
+/*
+ * hp_scan: Scan hazard pointer domain for @addr.
+ *
+ * Scan hazard pointer domain for @addr.
+ * If @retire_cb is NULL, wait to observe that each slot contains a value
+ * that differs from @addr.
+ * If @retire_cb is non-NULL, invoke @callback for each slot containing
+ * @addr.
+ */
+void hp_scan(struct hp_slot __percpu *percpu_slots, void *addr,
+	     void (*retire_cb)(int cpu, struct hp_slot *slot, void *addr));

struct hp_domain {
	struct hp_slot __percpu *slots
};

might clarify things a wee little.

Good point. This introduces:

#define DECLARE_HP_DOMAIN(domain)                                       \
        extern struct hp_domain domain

#define DEFINE_HP_DOMAIN(domain)                                        \
        static DEFINE_PER_CPU(struct hp_slot, __ ## domain ## _slots);  \
        struct hp_domain domain = {                                     \
                .percpu_slots = &__## domain ## _slots,                 \
        }


+
+/* Get the hazard pointer context address (may be NULL). */
+static inline
+void *hp_ctx_addr(struct hp_ctx ctx)
+{
+	return ctx.addr;
+}

 From where I'm sitting this seems like superfluous fluff, what's wrong
with ctx.addr ?

I'm OK removing the accessor and just using ctx.addr.


+/*
+ * hp_allocate: Allocate a hazard pointer.
+ *
+ * Allocate a hazard pointer slot for @addr. The object existence should
+ * be guaranteed by the caller. Expects to be called from preempt
+ * disable context.
+ *
+ * Returns a hazard pointer context.

So you made the WTF'o'meter crack, this here function does not allocate
nothing. Naming is bad. At best this is something like
try-set-hazard-pointer or somesuch.

I went with the naming from the 2004 paper from Maged Michael, but I
agree it could be clearer.

I'm tempted to go for "hp_try_post()" and "hp_remove()", basically
"posting" the intent to use a pointer (as in on a metaphorical billboard),
and removing it when it's done.


+ */
+static inline
+struct hp_ctx hp_allocate(struct hp_slot __percpu *percpu_slots, void *addr)
+{
+	struct hp_slot *slot;
+	struct hp_ctx ctx;
+
+	if (!addr)
+		goto fail;
+	slot = this_cpu_ptr(percpu_slots);
+	/*
+	 * A single hazard pointer slot per CPU is available currently.
+	 * Other hazard pointer domains can eventually have a different
+	 * configuration.
+	 */
+	if (READ_ONCE(slot->addr))
+		goto fail;
+	WRITE_ONCE(slot->addr, addr);	/* Store B */
+	ctx.slot = slot;
+	ctx.addr = addr;
+	return ctx;
+
+fail:
+	ctx.slot = NULL;
+	ctx.addr = NULL;
+	return ctx;
+}
+
+/*
+ * hp_dereference_allocate: Dereference and allocate a hazard pointer.
+ *
+ * Returns a hazard pointer context. Expects to be called from preempt
+ * disable context.
+ */

More terrible naming. Same as above, but additionally, I would expect a
'dereference' to actually dereference the pointer and have a return
value of the dereferenced type.

hp_dereference_try_post() ?


This function seems to double check and update the hp_ctx thing. I'm not
at all sure yet wtf this is doing -- and the total lack of comments
aren't helping.

The hp_ctx contains the outputs.

The function loads *addr_p to then try_post it into a HP slot. On success,
it re-reads the *addr_p (with address dependency) and if it still matches,
use that as output address pointer.

I'm planning to remove hp_ctx, and just have:

/*
 * hp_try_post: Try to post a hazard pointer.
 *
 * Post a hazard pointer slot for @addr. The object existence should
 * be guaranteed by the caller. Expects to be called from preempt
 * disable context.
 *
 * Returns true if post succeeds, false otherwise.
 */
static inline
bool hp_try_post(struct hp_domain *hp_domain, void *addr, struct hp_slot **_slot)
[...]

/*
 * hp_dereference_try_post: Dereference and try to post a hazard pointer.
 *
 * Returns a hazard pointer context. Expects to be called from preempt
 * disable context.
 */
static inline
void *__hp_dereference_try_post(struct hp_domain *hp_domain,
                                void * const * addr_p, struct hp_slot **_slot)
[...]

#define hp_dereference_try_post(domain, p, slot_p)              \
        ((__typeof__(*(p))) __hp_dereference_try_post(domain, (void * const *) p, slot_p))

/* Clear the hazard pointer in @slot. */
static inline
void hp_remove(struct hp_slot *slot)
[...]


+static inline
+struct hp_ctx hp_dereference_allocate(struct hp_slot __percpu *percpu_slots, void * const * addr_p)
+{
+	void *addr, *addr2;
+	struct hp_ctx ctx;
+
+	addr = READ_ONCE(*addr_p);
+retry:
+	ctx = hp_allocate(percpu_slots, addr);
+	if (!hp_ctx_addr(ctx))
+		goto fail;
+	/* Memory ordering: Store B before Load A. */
+	smp_mb();
+	/*
+	 * Use RCU dereference without lockdep checks, because
+	 * lockdep is not aware of HP guarantees.
+	 */
+	addr2 = rcu_access_pointer(*addr_p);	/* Load A */
+	/*
+	 * If @addr_p content has changed since the first load,
+	 * clear the hazard pointer and try again.
+	 */
+	if (!ptr_eq(addr2, addr)) {
+		WRITE_ONCE(ctx.slot->addr, NULL);
+		if (!addr2)
+			goto fail;
+		addr = addr2;
+		goto retry;
+	}
+	/*
+	 * Use addr2 loaded from rcu_access_pointer() to preserve
+	 * address dependency ordering.
+	 */
+	ctx.addr = addr2;
+	return ctx;
+
+fail:
+	ctx.slot = NULL;
+	ctx.addr = NULL;
+	return ctx;
+}
+
+/* Retire the hazard pointer in @ctx. */
+static inline
+void hp_retire(const struct hp_ctx ctx)
+{
+	smp_store_release(&ctx.slot->addr, NULL);
+}
+
+#endif /* _LINUX_HP_H */
diff --git a/kernel/Makefile b/kernel/Makefile
index 3c13240dfc9f..ec16de96fa80 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -7,7 +7,7 @@ obj-y     = fork.o exec_domain.o panic.o \
  	    cpu.o exit.o softirq.o resource.o \
  	    sysctl.o capability.o ptrace.o user.o \
  	    signal.o sys.o umh.o workqueue.o pid.o task_work.o \
-	    extable.o params.o \
+	    extable.o params.o hp.o \
  	    kthread.o sys_ni.o nsproxy.o \
  	    notifier.o ksysfs.o cred.o reboot.o \
  	    async.o range.o smpboot.o ucount.o regset.o ksyms_common.o
diff --git a/kernel/hp.c b/kernel/hp.c
new file mode 100644
index 000000000000..b2447bf15300
--- /dev/null
+++ b/kernel/hp.c
@@ -0,0 +1,46 @@
+// SPDX-FileCopyrightText: 2024 Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
+//
+// SPDX-License-Identifier: LGPL-2.1-or-later
+
+/*
+ * HP: Hazard Pointers
+ */
+
+#include <linux/hp.h>
+#include <linux/percpu.h>
+
+/*
+ * hp_scan: Scan hazard pointer domain for @addr.
+ *
+ * Scan hazard pointer domain for @addr.
+ * If @retire_cb is non-NULL, invoke @callback for each slot containing
+ * @addr.
+ * Wait to observe that each slot contains a value that differs from
+ * @addr before returning.
+ */
+void hp_scan(struct hp_slot __percpu *percpu_slots, void *addr,
+	     void (*retire_cb)(int cpu, struct hp_slot *slot, void *addr))
+{
+	int cpu;
+
+	/*
+	 * Store A precedes hp_scan(): it unpublishes addr (sets it to
+	 * NULL or to a different value), and thus hides it from hazard
+	 * pointer readers.
+	 */
+
+	if (!addr)
+		return;
+	/* Memory ordering: Store A before Load B. */
+	smp_mb();
+	/* Scan all CPUs slots. */
+	for_each_possible_cpu(cpu) {
+		struct hp_slot *slot = per_cpu_ptr(percpu_slots, cpu);
+
+		if (retire_cb && smp_load_acquire(&slot->addr) == addr)	/* Load B */
+			retire_cb(cpu, slot, addr);

Is retirce_cb allowed to cmpxchg the thing?

It could, but we'd need to make sure the slot is not re-used by another
hp_try_post() before the current user removes its own post. It would
need to synchronize with the current HP user (e.g. with IPI).

I've actually renamed retire_cb to "on_match_cb".


+		/* Busy-wait if node is found. */
+		while ((smp_load_acquire(&slot->addr)) == addr)	/* Load B */
+			cpu_relax();

This really should be using smp_cond_load_acquire()

Good point,

Thanks,

Mathieu


+	}
+}

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux