On Tue, Sep 17, 2024 at 4:33 PM Boqun Feng <boqun.feng@xxxxxxxxx> wrote: > Hazard pointers [1] provide a way to dynamically distribute refcounting > and can be used to improve the scalability of refcounting without > significant space cost. > +static inline void *__hazptr_tryprotect(hazptr_t *hzp, > + void *const *p, > + unsigned long head_offset) > +{ > + void *ptr; > + struct callback_head *head; > + > + ptr = READ_ONCE(*p); > + > + if (ptr == NULL) > + return NULL; > + > + head = (struct callback_head *)(ptr + head_offset); > + > + WRITE_ONCE(*hzp, head); > + smp_mb(); > + > + ptr = READ_ONCE(*p); // read again > + > + if (ptr + head_offset != head) { // pointer changed > + WRITE_ONCE(*hzp, NULL); // reset hazard pointer > + return NULL; > + } else > + return ptr; > +} I got nerdsniped by the Plumbers talk. So, about that smp_mb()... I think you should be able to avoid the smp_mb() using relaxed atomics (on architectures that have those), at the cost of something like a cmpxchg-acquire sandwiched between a load-acquire and a relaxed load? I'm not sure how their cost compares to an smp_mb() though. Something like this, *assuming there can only be one context at a time waiting for a given hazptr_t*: typedef struct { /* consists of: marker bit in least significant bit, rest is normal pointer */ atomic_long_t value; } hazptr_t; /* assumes that hzp is currently set to NULL (but it may contain a marker bit) */ static inline void *__hazptr_tryprotect(hazptr_t *hzp, void *const *p) { /* note that the loads of these three operations are ordered using acquire semantics */ void *ptr = smp_load_acquire(p); /* set pointer while leaving marker bit intact */ unsigned long hazard_scanning = atomic_long_fetch_or_acquire((unsigned long)ptr, &hzp->value); if (unlikely(hazard_scanning)) { BUG_ON(hazard_scanning != 1); /* slowpath, concurrent hazard pointer waiter */ smp_mb(); } if (READ_ONCE(*p) == ptr) { /* recheck */ atomic_long_and(~1UL, &hzp->value); return NULL; } return ptr; } /* simplified for illustration, assumes there's only a single hazard pointer @hzp that could point to @ptr */ static void remove_pointer_and_wait_for_hazard(hazptr_t *hzp, void *ptr, void *const *p) { WRITE_ONCE(*p, NULL); smb_mb(); /* set marker bit */ atomic_long_or(1UL, &hzp->value); while ((void*)(atomic_long_read(&hzp->value) & ~1UL) == ptr)) wait(); /* clear marker bit */ atomic_long_and(~1UL, &hzp->value); } The idea would be that the possible orderings when these two functions race against each other are: Ordering A: The atomic_long_fetch_or_acquire() in __hazptr_tryprotect() happens after the atomic_long_or(), two subcases: Ordering A1 (slowpath): atomic_long_fetch_or_acquire() is ordered before the atomic_long_and() in remove_pointer_and_wait_for_hazard(), so the marker bit is observed set, "hazard_scanning" is true. We go on the safe slowpath which is like the original patch, so it's safe. Ordering A2 (recheck fails): atomic_long_fetch_or_acquire() is ordered after the atomic_long_and() in remove_pointer_and_wait_for_hazard(), so the subsequent READ_ONCE(*p) is also ordered after the atomic_long_and(), which is ordered after the WRITE_ONCE(*p, NULL), so the READ_ONCE(*p) recheck must see a NULL pointer and fail. Ordering B (hazard pointer visible): The atomic_long_fetch_or_acquire() in __hazptr_tryprotect() happens before the atomic_long_or(). In that case, it also happens before the atomic_long_read() in remove_pointer_and_wait_for_hazard(), so the hazard pointer will be visible to remove_pointer_and_wait_for_hazard(). But this seems pretty gnarly/complicated, so even if my 2AM reasoning ability is correct, actually implementing this might not be a good idea... and it definitely wouldn't help on X86 at all, since X86 doesn't have such nice relaxed RMW ops.