> On Jan 3, 2020, at 5:14 PM, Waiman Long <longman@xxxxxxxxxx> wrote: > > On 12/30/19 2:40 PM, Alex Kogan wrote: >> +/* >> + * cna_scan_main_queue - scan the main waiting queue looking for the first >> + * thread running on the same NUMA node as the lock holder. If found (call it >> + * thread T), move all threads in the main queue between the lock holder and >> + * T to the end of the secondary queue and return 0 >> + * (=SUCCESSOR_FROM_SAME_NUMA_NODE_FOUND); otherwise, return the encoded > Are you talking about LOCAL_WAITER_FOUND? Ahh, yes — good catch! >> + * pointer of the last scanned node in the primary queue (so a subsequent scan >> + * can be resumed from that node). >> + * >> + * Schematically, this may look like the following (nn stands for numa_node and >> + * et stands for encoded_tail). >> + * >> + * when cna_scan_main_queue() is called (the secondary queue is empty): >> + * >> + * A+------------+ B+--------+ C+--------+ T+--------+ >> + * |mcs:next | -> |mcs:next| -> |mcs:next| -> |mcs:next| -> NULL >> + * |mcs:locked=1| |cna:nn=0| |cna:nn=2| |cna:nn=1| >> + * |cna:nn=1 | +--------+ +--------+ +--------+ >> + * +----------- + >> + * >> + * when cna_scan_main_queue() returns (the secondary queue contains B and C): >> + * >> + * A+----------------+ T+--------+ >> + * |mcs:next | -> |mcs:next| -> NULL >> + * |mcs:locked=C.et | -+ |cna:nn=1| >> + * |cna:nn=1 | | +--------+ >> + * +--------------- + +-----+ >> + * \/ >> + * B+--------+ C+--------+ >> + * |mcs:next| -> |mcs:next| -+ >> + * |cna:nn=0| |cna:nn=2| | >> + * +--------+ +--------+ | >> + * ^ | >> + * +---------------------+ >> + * >> + * The worst case complexity of the scan is O(n), where n is the number >> + * of current waiters. However, the amortized complexity is close to O(1), >> + * as the immediate successor is likely to be running on the same node once >> + * threads from other nodes are moved to the secondary queue. >> + * >> + * @node : Pointer to the MCS node of the lock holder >> + * @pred_start: Pointer to the MCS node of the waiter whose successor should be >> + * the first node in the scan >> + * Return : LOCAL_WAITER_FOUND or encoded tail of the last scanned waiter >> + */ >> +static u32 cna_scan_main_queue(struct mcs_spinlock *node, >> + struct mcs_spinlock *pred_start) >> +{ >> + struct cna_node *cn = (struct cna_node *)node; >> + struct cna_node *cni = (struct cna_node *)READ_ONCE(pred_start->next); >> + struct cna_node *last; >> + int my_numa_node = cn->numa_node; >> + >> + /* find any next waiter on 'our' NUMA node */ >> + for (last = cn; >> + cni && cni->numa_node != my_numa_node; >> + last = cni, cni = (struct cna_node *)READ_ONCE(cni->mcs.next)) >> + ; >> + >> + /* if found, splice any skipped waiters onto the secondary queue */ >> + if (cni) { >> + if (last != cn) /* did we skip any waiters? */ >> + cna_splice_tail(node, node->next, >> + (struct mcs_spinlock *)last); >> + return LOCAL_WAITER_FOUND; >> + } >> + >> + return last->encoded_tail; >> +} >> + >> >> +/* >> + * Switch to the NUMA-friendly slow path for spinlocks when we have >> + * multiple NUMA nodes in native environment, unless the user has >> + * overridden this default behavior by setting the numa_spinlock flag. >> + */ >> +void cna_configure_spin_lock_slowpath(void) > Nit: There should be a __init. True. I will fix that. >> +{ >> + if ((numa_spinlock_flag == 1) || >> + (numa_spinlock_flag == 0 && nr_node_ids > 1 && >> + pv_ops.lock.queued_spin_lock_slowpath == >> + native_queued_spin_lock_slowpath)) { >> + pv_ops.lock.queued_spin_lock_slowpath = >> + __cna_queued_spin_lock_slowpath; >> + >> + pr_info("Enabling CNA spinlock\n"); >> + } >> +} > > Other than these two minor nits, the rests looks good to me. Great. I will revise and resubmit. Best regards, — Alex