> In my tests reclaimed nodes have their next pointers immediately set > to point to the list head. If the kernel gets a node with its @next > pointing to something else, then yes, things break down (the kernel > kills the process); this has happened occasionally when I had a bug in > the userspace code. I believe that approach is fine for production, but for testing it may not detect some bugs. For example, it may not detect the race I detail below. > Could you, please, provide a bit more details on when/how the race can > happen? Servers add themselves to the list, so there can be no races > there (servers going idle: add-to-the-list; wait; gc (under a lock); > restore @next; do stuff). I believe the race can happen if the kernel attempts to wake an idle server concurrently with a call to gc. Here's an example list where the head points to a list of 3 items denoted A, B, C, and the second character denotes whether the element is marked for deletion: 'x' means marked, 'o' means unmarked. 'H' denotes the head. H -> Ax -> Bo -> Co Now, atomic_stack_gc is expected to unlink node 'A' so it can be reclaimed, leading to "H -> Bo -> Co". Once 'A' is unlinked it can be either deleted or pushed back on the list at a later time. In the following snippet of the atomic_stack_pop function (trimmed for space): static inline uint64_t* atomic_stack_pop(uint64_t *head) { uint64_t *curr = (uint64_t *)atomic_load_explicit(head); do { if (!curr) return NULL; // <--- Pause uint64_t next = atomic_load_explicit(curr); At the location marked "<--- Pause", assume the code has the address of node 'A' and is about to dereference it to get the address of node 'B'. If the thread running this code pauses for any reason, a different CPU can run atomic_stack_gc and delete node 'A'. At that point, the pop function resumes and dereferences a node that no longer exists. The thread pause can have many causes; in user-space, preemption is the most obvious, but hardware interrupts or even last-level cache misses can cause enough of a slowdown for this to happen. The fundamental problem is that there is nothing to prevent multiple threads from manipulating the list AND concurrently attempting to reclaim nodes. As far as I know, this requires locking or a lock-free memory reclamation scheme where nodes are unlinked and then consensus is established that no thread can reach the unlinked data before reclaiming the unlinked node. While both approaches are do-able, it requires extra communication between the kernel and user-space. In general, the kernel needs to be robust to misbehaving users and their concurrent operations. Hence, I would be wary of any looping in kernel involving an atomic operation, which requires cooperation among threads to avoid starvation. > Workers are trickier, as they can be woken by signals and then block > again, but stray signals are so bad here that I'm thinking of actually > not letting sleeping workers wake on signals. Other than signals > waking queued/unqueued idle workers, are there any other potential > races here? Timeouts on blocked threads is virtually the same as a signal I think. I can see that both could lead to attempts at waking workers that are not blocked. I haven't looked too much at the state transitions yet. I think the guarantees offered by the two lists will result in potential races in the rest of the code. Thierry