On Mon, Nov 04, 2013 at 12:17:17PM -0500, Waiman Long wrote: > This patch introduces a new read/write lock implementation that put > waiting readers and writers into a queue instead of actively contending > the lock like the current read/write lock implementation. This will > improve performance in highly contended situation by reducing the > cache line bouncing effect. > > The queue read/write lock (qrwlock) is mostly fair with respect to > the writers, even though there is still a slight chance of write > lock stealing. > > Externally, there are two different types of readers - unfair (the > default) and fair. A unfair reader will try to steal read lock even > if a writer is waiting, whereas a fair reader will be waiting in > the queue under this circumstance. These variants are chosen at > initialization time by using different initializers. The new *_fair() > initializers are added for selecting the use of fair reader. > > Internally, there is a third type of readers which steal lock more > aggressively than the unfair reader. They simply increments the reader > count and wait until the writer releases the lock. The transition to > aggressive reader happens in the read lock slowpath when > 1. In an interrupt context. > 2. when a classic reader comes to the head of the wait queue. > 3. When a fair reader comes to the head of the wait queue and sees > the release of a write lock. > > The fair queue rwlock is more deterministic in the sense that late > comers jumping ahead and stealing the lock is unlikely even though > there is still a very small chance for lock stealing to happen if > the readers or writers come at the right moment. Other than that, > lock granting is done in a FIFO manner. As a result, it is possible > to determine a maximum time period after which the waiting is over > and the lock can be acquired. > > The queue read lock is safe to use in an interrupt context (softirq > or hardirq) as it will switch to become an aggressive reader in such > environment allowing recursive read lock. However, the fair readers > will not support recursive read lock in a non-interrupt environment > when a writer is waiting. > > The only downside of queue rwlock is the size increase in the lock > structure by 4 bytes for 32-bit systems and by 12 bytes for 64-bit > systems. > > This patch will replace the architecture specific implementation > of rwlock by this generic version of queue rwlock when the > ARCH_QUEUE_RWLOCK configuration parameter is set. > > In term of single-thread performance (no contention), a 256K > lock/unlock loop was run on a 2.4GHz and 2.93Ghz Westmere x86-64 > CPUs. The following table shows the average time (in ns) for a single > lock/unlock sequence (including the looping and timing overhead): > > Lock Type 2.4GHz 2.93GHz > --------- ------ ------- > Ticket spinlock 14.9 12.3 > Read lock 17.0 13.5 > Write lock 17.0 13.5 > Queue read lock 16.0 13.5 > Queue fair read lock 16.0 13.5 > Queue write lock 9.2 7.8 > Queue fair write lock 17.5 14.5 > > The queue read lock is slightly slower than the spinlock, but is > slightly faster than the read lock. The queue write lock, however, > is the fastest of all. It is almost twice as fast as the write lock > and about 1.5X of the spinlock. The queue fair write lock, on the > other hand, is slightly slower than the write lock. > > With lock contention, the speed of each individual lock/unlock function > is less important than the amount of contention-induced delays. > > To investigate the performance characteristics of the queue rwlock > compared with the regular rwlock, Ingo's anon_vmas patch that convert > rwsem to rwlock was applied to a 3.12-rc2 kernel. This kernel was > then tested under the following 4 conditions: > > 1) Plain 3.12-rc2 > 2) Ingo's patch > 3) Ingo's patch + unfair qrwlock (default) > 4) Ingo's patch + fair qrwlock > > The jobs per minutes (JPM) results of the AIM7's high_systime workload > at 1500 users on a 8-socket 80-core DL980 (HT off) were: > > Kernel JPM %Change from (1) > ------ --- ---------------- > 1 148265 - > 2 238715 +61% > 3 242048 +63% > 4 234881 +58% > > The use of unfair qrwlock provides a small boost of 2%, while using > fair qrwlock leads to 3% decrease of performance. However, looking > at the perf profiles, we can clearly see that other bottlenecks were > constraining the performance improvement. > > Perf profile of kernel (2): > > 18.20% reaim [kernel.kallsyms] [k] __write_lock_failed > 9.36% reaim [kernel.kallsyms] [k] _raw_spin_lock_irqsave > 2.91% reaim [kernel.kallsyms] [k] mspin_lock > 2.73% reaim [kernel.kallsyms] [k] anon_vma_interval_tree_insert > 2.23% ls [kernel.kallsyms] [k] _raw_spin_lock_irqsave > 1.29% reaim [kernel.kallsyms] [k] __read_lock_failed > 1.21% true [kernel.kallsyms] [k] _raw_spin_lock_irqsave > 1.14% reaim [kernel.kallsyms] [k] zap_pte_range > 1.13% reaim [kernel.kallsyms] [k] _raw_spin_lock > 1.04% reaim [kernel.kallsyms] [k] mutex_spin_on_owner > > Perf profile of kernel (3): > > 10.57% reaim [kernel.kallsyms] [k] _raw_spin_lock_irqsave > 7.98% reaim [kernel.kallsyms] [k] queue_write_lock_slowpath > 5.83% reaim [kernel.kallsyms] [k] mspin_lock > 2.86% ls [kernel.kallsyms] [k] _raw_spin_lock_irqsave > 2.71% reaim [kernel.kallsyms] [k] anon_vma_interval_tree_insert > 1.52% true [kernel.kallsyms] [k] _raw_spin_lock_irqsave > 1.51% reaim [kernel.kallsyms] [k] queue_read_lock_slowpath > 1.35% reaim [kernel.kallsyms] [k] mutex_spin_on_owner > 1.12% reaim [kernel.kallsyms] [k] zap_pte_range > 1.06% reaim [kernel.kallsyms] [k] perf_event_aux_ctx > 1.01% reaim [kernel.kallsyms] [k] perf_event_aux But wouldn't kernel (4) be the one that was the most highly constrained? (That said, yes, I get that _raw_spin_lock_irqsave() is some lock that is unrelated to the qrwlock.) > Tim Chen also tested the qrwlock with Ingo's patch on a 4-socket > machine. It was found the performance improvement of 11% was the > same with regular rwlock or queue rwlock. > > Signed-off-by: Waiman Long <Waiman.Long@xxxxxx> Some memory-barrier issues with additional commentary below. Thanx, Paul > --- > include/asm-generic/qrwlock.h | 256 +++++++++++++++++++++++++++++++++++++++++ > kernel/Kconfig.locks | 7 + > lib/Makefile | 1 + > lib/qrwlock.c | 247 +++++++++++++++++++++++++++++++++++++++ > 4 files changed, 511 insertions(+), 0 deletions(-) > create mode 100644 include/asm-generic/qrwlock.h > create mode 100644 lib/qrwlock.c > > diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h > new file mode 100644 > index 0000000..78ad4a5 > --- /dev/null > +++ b/include/asm-generic/qrwlock.h > @@ -0,0 +1,256 @@ > +/* > + * Queue read/write lock > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * (C) Copyright 2013 Hewlett-Packard Development Company, L.P. > + * > + * Authors: Waiman Long <waiman.long@xxxxxx> > + */ > +#ifndef __ASM_GENERIC_QRWLOCK_H > +#define __ASM_GENERIC_QRWLOCK_H > + > +#include <linux/types.h> > +#include <asm/bitops.h> > +#include <asm/cmpxchg.h> > +#include <asm/barrier.h> > +#include <asm/processor.h> > +#include <asm/byteorder.h> > + > +#if !defined(__LITTLE_ENDIAN) && !defined(__BIG_ENDIAN) > +#error "Missing either LITTLE_ENDIAN or BIG_ENDIAN definition." > +#endif > + > +#if (CONFIG_NR_CPUS < 65536) > +typedef u16 __nrcpu_t; > +typedef u32 __nrcpupair_t; > +#define QRW_READER_BIAS (1U << 16) > +#else > +typedef u32 __nrcpu_t; > +typedef u64 __nrcpupair_t; > +#define QRW_READER_BIAS (1UL << 32) > +#endif > + > +/* > + * The queue read/write lock data structure > + * > + * Read lock stealing can only happen when there is at least one reader > + * holding the read lock. When the fair flag is not set, it mimics the > + * behavior of the regular rwlock at the expense that a perpetual stream > + * of readers could starve a writer for a long period of time. That > + * behavior, however, may be beneficial to a workload that is reader heavy > + * with slow writers, and the writers can wait without undesirable consequence. > + * This fair flag should only be set at initialization time. > + * > + * The layout of the structure is endian-sensitive to make sure that adding > + * QRW_READER_BIAS to the rw field to increment the reader count won't > + * disturb the writer and the fair fields. > + */ > +struct qrwnode { > + struct qrwnode *next; > + bool wait; /* Waiting flag */ > +}; > + > +typedef struct qrwlock { > + union qrwcnts { > + struct { > +#ifdef __LITTLE_ENDIAN > + u8 writer; /* Writer state */ > + u8 fair; /* Fair rwlock flag */ > + __nrcpu_t readers; /* # of active readers */ > +#else > + __nrcpu_t readers; /* # of active readers */ > + u8 fair; /* Fair rwlock flag */ > + u8 writer; /* Writer state */ > +#endif > + }; > + __nrcpupair_t rw; /* Reader/writer number pair */ > + } cnts; > + struct qrwnode *waitq; /* Tail of waiting queue */ > +} arch_rwlock_t; > + > +/* > + * Writer state values & mask > + */ > +#define QW_WAITING 1 /* A writer is waiting */ > +#define QW_LOCKED 0xff /* A writer holds the lock */ > +#define QW_MASK_FAIR ((u8)~0) /* Mask for fair reader */ > +#define QW_MASK_UNFAIR ((u8)~QW_WAITING) /* Mask for unfair reader */ > + > +/* > + * External function declarations > + */ > +extern void queue_read_lock_slowpath(struct qrwlock *lock); > +extern void queue_write_lock_slowpath(struct qrwlock *lock); > + > +/** > + * queue_read_can_lock- would read_trylock() succeed? > + * @lock: Pointer to queue rwlock structure > + */ > +static inline int queue_read_can_lock(struct qrwlock *lock) > +{ > + union qrwcnts rwcnts; > + > + rwcnts.rw = ACCESS_ONCE(lock->cnts.rw); > + return !rwcnts.writer || (!rwcnts.fair && rwcnts.readers); > +} > + > +/** > + * queue_write_can_lock- would write_trylock() succeed? > + * @lock: Pointer to queue rwlock structure > + */ > +static inline int queue_write_can_lock(struct qrwlock *lock) > +{ > + union qrwcnts rwcnts; > + > + rwcnts.rw = ACCESS_ONCE(lock->cnts.rw); > + return !rwcnts.writer && !rwcnts.readers; > +} > + > +/** > + * queue_read_trylock - try to acquire read lock of a queue rwlock > + * @lock : Pointer to queue rwlock structure > + * Return: 1 if lock acquired, 0 if failed > + */ > +static inline int queue_read_trylock(struct qrwlock *lock) > +{ > + union qrwcnts cnts; > + u8 wmask; > + > + cnts.rw = ACCESS_ONCE(lock->cnts.rw); > + wmask = cnts.fair ? QW_MASK_FAIR : QW_MASK_UNFAIR; > + if (likely(!(cnts.writer & wmask))) { > + cnts.rw = xadd(&lock->cnts.rw, QRW_READER_BIAS); On an unfair lock, this can momentarily make queue_read_can_lock() give a false positive. Not sure that this is a problem -- after all, the return value from queue_read_can_lock() is immediately obsolete anyway. > + if (likely(!(cnts.writer & wmask))) > + return 1; > + /* > + * Restore correct reader count > + * It had been found that two nearly consecutive atomic > + * operations (xadd & add) can cause significant cacheline > + * contention. By inserting a pause between these two atomic > + * operations, it can significantly reduce unintended > + * contention. > + */ > + cpu_relax(); > + add_smp(&lock->cnts.readers, -1); > + } > + return 0; > +} > + > +/** > + * queue_write_trylock - try to acquire write lock of a queue rwlock > + * @lock : Pointer to queue rwlock structure > + * Return: 1 if lock acquired, 0 if failed > + */ > +static inline int queue_write_trylock(struct qrwlock *lock) > +{ > + union qrwcnts old, new; > + > + old.rw = ACCESS_ONCE(lock->cnts.rw); > + if (likely(!old.writer && !old.readers)) { > + new.rw = old.rw; > + new.writer = QW_LOCKED; > + if (likely(cmpxchg(&lock->cnts.rw, old.rw, new.rw) == old.rw)) > + return 1; > + } > + return 0; > +} > +/** > + * queue_read_lock - acquire read lock of a queue rwlock > + * @lock: Pointer to queue rwlock structure > + */ > +static inline void queue_read_lock(struct qrwlock *lock) > +{ > + union qrwcnts cnts; > + u8 wmask; > + > + cnts.rw = xadd(&lock->cnts.rw, QRW_READER_BIAS); > + wmask = cnts.fair ? QW_MASK_FAIR : QW_MASK_UNFAIR; > + if (likely(!(cnts.writer & wmask))) > + return; > + /* > + * Slowpath will decrement the reader count, if necessary > + */ > + queue_read_lock_slowpath(lock); > +} > + > +/** > + * queue_write_lock - acquire write lock of a queue rwlock > + * @lock : Pointer to queue rwlock structure > + */ > +static inline void queue_write_lock(struct qrwlock *lock) > +{ > + union qrwcnts old; > + > + /* > + * Optimize for the unfair lock case where the fair flag is 0. > + */ > + old.rw = cmpxchg(&lock->cnts.rw, 0, QW_LOCKED); > + if (likely(old.rw == 0)) > + return; > + if (likely(!old.writer && !old.readers)) { > + union qrwcnts new; > + > + new.rw = old.rw; > + new.writer = QW_LOCKED; > + if (likely(cmpxchg(&lock->cnts.rw, old.rw, new.rw) == old.rw)) > + return; > + } > + queue_write_lock_slowpath(lock); > +} > + > +/** > + * queue_read_unlock - release read lock of a queue rwlock > + * @lock : Pointer to queue rwlock structure > + */ > +static inline void queue_read_unlock(struct qrwlock *lock) > +{ > + /* > + * Atomically decrement the reader count > + */ > + add_smp(&lock->cnts.readers, -1); > +} > + > +/** > + * queue_write_unlock - release write lock of a queue rwlock > + * @lock : Pointer to queue rwlock structure > + */ > +static inline void queue_write_unlock(struct qrwlock *lock) > +{ > + /* > + * Make sure that none of the critical section will be leaked out. > + */ > + smp_mb__before_clear_bit(); > + ACCESS_ONCE(lock->cnts.writer) = 0; > + smp_mb__after_clear_bit(); How about the new smp_store_release() for this write? Looks to me that smp_mb__before_clear_bit() and smp_mb__after_clear_bit() work by accident, if they in fact do work for all architectures. > +} > + > +/* > + * Initializier > + */ > +#define __ARCH_RW_LOCK_UNLOCKED { .cnts = { .rw = 0 }, .waitq = NULL } > +#define __ARCH_RW_LOCK_UNLOCKED_FAIR \ > + { .cnts = { { .writer = 0, .fair = 1, .readers = 0 } }, .waitq = NULL } > + > +/* > + * Remapping rwlock architecture specific functions to the corresponding > + * queue rwlock functions. > + */ > +#define arch_read_can_lock(l) queue_read_can_lock(l) > +#define arch_write_can_lock(l) queue_write_can_lock(l) > +#define arch_read_lock(l) queue_read_lock(l) > +#define arch_write_lock(l) queue_write_lock(l) > +#define arch_read_trylock(l) queue_read_trylock(l) > +#define arch_write_trylock(l) queue_write_trylock(l) > +#define arch_read_unlock(l) queue_read_unlock(l) > +#define arch_write_unlock(l) queue_write_unlock(l) > + > +#endif /* __ASM_GENERIC_QRWLOCK_H */ > diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks > index d2b32ac..b665478 100644 > --- a/kernel/Kconfig.locks > +++ b/kernel/Kconfig.locks > @@ -223,3 +223,10 @@ endif > config MUTEX_SPIN_ON_OWNER > def_bool y > depends on SMP && !DEBUG_MUTEXES > + > +config ARCH_QUEUE_RWLOCK > + bool > + > +config QUEUE_RWLOCK > + def_bool y if ARCH_QUEUE_RWLOCK > + depends on SMP > diff --git a/lib/Makefile b/lib/Makefile > index f3bb2cb..e3175db 100644 > --- a/lib/Makefile > +++ b/lib/Makefile > @@ -189,3 +189,4 @@ quiet_cmd_build_OID_registry = GEN $@ > clean-files += oid_registry_data.c > > obj-$(CONFIG_UCS2_STRING) += ucs2_string.o > +obj-$(CONFIG_QUEUE_RWLOCK) += qrwlock.o > diff --git a/lib/qrwlock.c b/lib/qrwlock.c > new file mode 100644 > index 0000000..a85b9e1 > --- /dev/null > +++ b/lib/qrwlock.c > @@ -0,0 +1,247 @@ > +/* > + * Queue read/write lock > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * (C) Copyright 2013 Hewlett-Packard Development Company, L.P. > + * > + * Authors: Waiman Long <waiman.long@xxxxxx> > + */ > +#include <linux/smp.h> > +#include <linux/bug.h> > +#include <linux/cpumask.h> > +#include <linux/percpu.h> > +#include <linux/hardirq.h> > +#include <asm-generic/qrwlock.h> > + > +/* > + * Compared with regular rwlock, the queue rwlock has has the following > + * advantages: > + * 1. It is more deterministic for the fair variant. Even though there is > + * a slight chance of stealing the lock if come at the right moment, the > + * granting of the lock is mostly in FIFO order. Even the default unfair > + * variant is fairer at least among the writers. > + * 2. It is faster in high contention situation. Sometimes, anyway! (Referring to your performance results on top of Ingo's patch.) > + * > + * The only downside is that the lock is 4 bytes larger in 32-bit systems > + * and 12 bytes larger in 64-bit systems. > + * > + * There are two queues for writers. The writer field of the lock is a > + * one-slot wait queue. The writers that follow will have to wait in the > + * combined reader/writer queue (waitq). > + * > + * Compared with x86 ticket spinlock, the queue rwlock is faster in high > + * contention situation. The writer lock is also faster in single thread > + * operations. Therefore, queue rwlock can be considered as a replacement > + * for those spinlocks that are highly contended as long as an increase > + * in lock size is not an issue. > + */ > + > +/** > + * wait_in_queue - Add to queue and wait until it is at the head > + * @lock: Pointer to queue rwlock structure > + * @node: Node pointer to be added to the queue > + * > + * The use of smp_wmb() is to make sure that the other CPUs see the change > + * ASAP. > + */ > +static __always_inline void > +wait_in_queue(struct qrwlock *lock, struct qrwnode *node) > +{ > + struct qrwnode *prev; > + > + node->next = NULL; > + node->wait = true; > + prev = xchg(&lock->waitq, node); > + if (prev) { > + prev->next = node; > + smp_wmb(); This smp_wmb() desperately needs a comment. Presumably it is ordering the above "prev->next = node" with some later write, but what write? Oh... I see the header comment above. Actually, memory barriers don't necessarily make things visible sooner. They are instead used for ordering. Or did you actually measure a performance increase with this? (Seems -highly- unlikely given smp_wmb()'s definition on x86...) > + /* > + * Wait until the waiting flag is off > + */ > + while (ACCESS_ONCE(node->wait)) > + cpu_relax(); > + } > +} > + > +/** > + * signal_next - Signal the next one in queue to be at the head > + * @lock: Pointer to queue rwlock structure > + * @node: Node pointer to the current head of queue > + */ > +static __always_inline void > +signal_next(struct qrwlock *lock, struct qrwnode *node) > +{ > + struct qrwnode *next; > + > + /* > + * Try to notify the next node first without disturbing the cacheline > + * of the lock. If that fails, check to see if it is the last node > + * and so should clear the wait queue. > + */ > + next = ACCESS_ONCE(node->next); > + if (likely(next)) > + goto notify_next; > + > + /* > + * Clear the wait queue if it is the last node > + */ > + if ((ACCESS_ONCE(lock->waitq) == node) && > + (cmpxchg(&lock->waitq, node, NULL) == node)) > + return; > + /* > + * Wait until the next one in queue set up the next field > + */ > + while (likely(!(next = ACCESS_ONCE(node->next)))) > + cpu_relax(); > + /* > + * The next one in queue is now at the head > + */ > +notify_next: > + barrier(); > + ACCESS_ONCE(next->wait) = false; > + smp_wmb(); Because smp_wmb() does not order reads, reads from the critical section could leak out of the critical section. A full memory barrier (smp_mb()) seems necessary to avoid this. Yes, you do have full memory barriers implicit in various atomic operations, but it appears to be possible to avoid them all in some situations. > +} > + > +/** > + * rspin_until_writer_unlock - inc reader count & spin until writer is gone > + * @lock: Pointer to queue rwlock structure > + * > + * In interrupt context or at the head of the queue, the reader will just > + * increment the reader count & wait until the writer releases the lock. > + */ > +static __always_inline void > +rspin_until_writer_unlock(struct qrwlock *lock, int inc) > +{ > + union qrwcnts cnts; > + > + if (inc) > + cnts.rw = xadd(&lock->cnts.rw, QRW_READER_BIAS); > + else > + cnts.rw = ACCESS_ONCE(lock->cnts.rw); > + while (cnts.writer == QW_LOCKED) { > + cpu_relax(); > + cnts.rw = ACCESS_ONCE(lock->cnts.rw); > + } > +} > + > +/** > + * queue_read_lock_slowpath - acquire read lock of a queue rwlock > + * @lock: Pointer to queue rwlock structure > + */ > +void queue_read_lock_slowpath(struct qrwlock *lock) > +{ > + struct qrwnode node; > + union qrwcnts cnts; > + > + /* > + * Readers come here when it cannot get the lock without waiting > + */ > + if (unlikely(irq_count())) { > + /* > + * Readers in interrupt context will spin until the lock is > + * available without waiting in the queue. > + */ > + rspin_until_writer_unlock(lock, 0); > + return; > + } > + cnts.rw = xadd(&lock->cnts.rw, -QRW_READER_BIAS); > + > + /* > + * Put the reader into the wait queue > + */ > + wait_in_queue(lock, &node); > + > + /* > + * At the head of the wait queue now, try to increment the reader > + * count and get the lock. > + */ > + if (unlikely(cnts.fair)) { > + /* > + * For fair reader, wait until the writer state goes to 0 > + * before incrementing the reader count. > + */ > + while (ACCESS_ONCE(lock->cnts.writer)) > + cpu_relax(); > + } > + rspin_until_writer_unlock(lock, 1); > + signal_next(lock, &node); > +} > +EXPORT_SYMBOL(queue_read_lock_slowpath); > + > +/** > + * queue_write_3step_lock - acquire write lock in 3 steps > + * @lock : Pointer to queue rwlock structure > + * Return: 1 if lock acquired, 0 otherwise > + * > + * Step 1 - Try to acquire the lock directly if no reader is present > + * Step 2 - Set the waiting flag to notify readers that a writer is waiting > + * Step 3 - When the readers field goes to 0, set the locked flag > + * > + * When not in fair mode, the readers actually ignore the second step. > + * However, this is still necessary to force other writers to fall in line. > + */ > +static __always_inline int queue_write_3step_lock(struct qrwlock *lock) > +{ > + union qrwcnts old, new; > + > + old.rw = ACCESS_ONCE(lock->cnts.rw); > + > + /* Step 1 */ > + if (!old.writer & !old.readers) { > + new.rw = old.rw; > + new.writer = QW_LOCKED; > + if (likely(cmpxchg(&lock->cnts.rw, old.rw, new.rw) == old.rw)) > + return 1; > + } > + > + /* Step 2 */ > + if (old.writer || (cmpxchg(&lock->cnts.writer, 0, QW_WAITING) != 0)) > + return 0; > + > + /* Step 3 */ > + while (true) { > + cpu_relax(); > + old.rw = ACCESS_ONCE(lock->cnts.rw); Suppose that there now is a writer, but no readers... > + if (!old.readers) { > + new.rw = old.rw; > + new.writer = QW_LOCKED; > + if (likely(cmpxchg(&lock->cnts.rw, old.rw, new.rw) > + == old.rw)) ... can't this mistakenly hand out the lock to a second writer? Ah, the trick is that we are at the head of the queue, so the only writer we can possibly contend with is a prior holder of the lock. Once that writer leaves, no other writer but can appear. And the QW_WAITING bit prevents new writers from immediately grabbing the lock. > + return 1; > + } > + } > + /* Should never reach here */ > + return 0; > +} > + > +/** > + * queue_write_lock_slowpath - acquire write lock of a queue rwlock > + * @lock : Pointer to queue rwlock structure > + */ > +void queue_write_lock_slowpath(struct qrwlock *lock) > +{ > + struct qrwnode node; > + > + /* > + * Put the writer into the wait queue > + */ > + wait_in_queue(lock, &node); > + > + /* > + * At the head of the wait queue now, call queue_write_3step_lock() > + * to acquire the lock until it is done. > + */ > + while (!queue_write_3step_lock(lock)) > + cpu_relax(); If we get here, queue_write_3step_lock() just executed a successful cmpxchg(), which implies a full memory barrier. This prevents the critical section from leaking out, good! > + signal_next(lock, &node); > +} > +EXPORT_SYMBOL(queue_write_lock_slowpath); > -- > 1.7.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html