On Mon, Nov 11, 2013 at 08:57:27PM -0500, Waiman Long wrote: > On 11/11/2013 04:17 PM, Tim Chen wrote: > >>You could then augment that with [cmp]xchg_{acquire,release} as > >>appropriate. > >> > >>>+/* > >>> * In order to acquire the lock, the caller should declare a local node and > >>> * pass a reference of the node to this function in addition to the lock. > >>> * If the lock has already been acquired, then this will proceed to spin > >>>@@ -37,15 +62,19 @@ void mcs_spin_lock(struct mcs_spinlock **lock, struct mcs_spinlock *node) > >>> node->locked = 0; > >>> node->next = NULL; > >>> > >>>- prev = xchg(lock, node); > >>>+ /* xchg() provides a memory barrier */ > >>>+ prev = xchg_acquire(lock, node); > >>> if (likely(prev == NULL)) { > >>> /* Lock acquired */ > >>> return; > >>> } > >>> ACCESS_ONCE(prev->next) = node; > >>>- smp_wmb(); > >>>- /* Wait until the lock holder passes the lock down */ > >>>- while (!ACCESS_ONCE(node->locked)) > >>>+ /* > >>>+ * Wait until the lock holder passes the lock down. > >>>+ * Using smp_load_acquire() provides a memory barrier that > >>>+ * ensures subsequent operations happen after the lock is acquired. > >>>+ */ > >>>+ while (!(smp_load_acquire(&node->locked))) > >>> arch_mutex_cpu_relax(); > >An alternate implementation is > > while (!ACCESS_ONCE(node->locked)) > > arch_mutex_cpu_relax(); > > smp_load_acquire(&node->locked); > > > >Leaving the smp_load_acquire at the end to provide appropriate barrier. > >Will that be acceptable? > > > >Tim > > I second Tim's opinion. It will be help to have a > smp_mb_load_acquire() function that provide a memory barrier with > load-acquire semantic. I don't think we need one for store-release > as that will not be in a loop. Hmmm... I guess the ACCESS_ONCE() in the smp_load_acquire() should prevent it from being optimized away. But yes, you then end up with an extra load on the critical lock hand-off patch. And something like an smp_mb_acquire() could then be useful, although I believe that on all current hardware smp_mb_acquire() emits the same code as would an smp_mb_release(): o barrier() on TSO systems such as x86 and s390. o lwsync instruction on powerpc. (Really old systems would want a different instruction for smp_mb_acquire(), but let's not optimize for really old systems.) o dmb instruction on ARM. o mf instruction on ia64. So how about an smp_mb_acquire_release() to cover both use cases? This could be used to further optimize circular buffers, for example. Thanx, Paul > Peter, what do you think about adding that to your patch? > > -Longman > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>