On Sat, Oct 24, 2015 at 12:26:27PM +0200, Peter Zijlstra wrote: > On Thu, Oct 22, 2015 at 08:07:16PM +0800, Boqun Feng wrote: > > On Wed, Oct 21, 2015 at 09:48:25PM +0200, Peter Zijlstra wrote: > > > On Wed, Oct 21, 2015 at 12:35:23PM -0700, Paul E. McKenney wrote: > > > > > > > > I ask this because I recall Peter once bought up a discussion: > > > > > > > > > > > > > > > > https://lkml.org/lkml/2015/8/26/596 > > > > > > > > So a full barrier on one side of these operations is enough, I think. > > > > > IOW, there is no need to strengthen these operations. > > > > > > > > Do we need to also worry about other futex use cases? > > > > > > Worry, always! > > > > > > But yes, there is one more specific usecase, which is that of a > > > condition variable. > > > > > > When we go sleep on a futex, we might want to assume visibility of the > > > stores done by the thread that woke us by the time we wake up. > > > > > > > But the thing is futex atomics in PPC are already RELEASE(pc)+ACQUIRE > > and imply a full barrier, is an RELEASE(sc) semantics really needed > > here? > > For this, no, the current code should be fine I think. > > > Further more, is this condition variable visibility guaranteed by other > > part of futex? Because in futex_wake_op: > > > > futex_wake_op() > > ... > > double_unlock_hb(hb1, hb2); <- RELEASE(pc) barrier here. > > wake_up_q(&wake_q); > > > > and in futex_wait(): > > > > futex_wait() > > ... > > futex_wait_queue_me(hb, &q, to); <- schedule() here > > ... > > unqueue_me(&q) > > drop_futex_key_refs(&q->key); > > iput()/mmdrop(); <- a full barrier > > > > > > The RELEASE(pc) barrier pairs with the full barrier, therefore the > > userspace wakee can observe the condition variable modification. > > Right, futexes are a pain; and I think we all agreed we didn't want to > go rely on implementation details unless we absolutely _have_ to. > Agreed. Besides, after I have read why futex_wake_op(the caller of futex_atomic_op_inuser()) is introduced, I think your worries are quite reasonable. I thought the futex_atomic_op_inuser() only operated on futex related variables, but it turns out it can actually operate any userspace variable if userspace code likes, therefore we don't have control of all memory ordering guarantee of the variable. So if PPC doesn't provide a full barrier at user<->kernel boundries, we should make futex_atomic_op_inuser() fully ordered. Still looking into futex_atomic_cmpxchg_inatomic() ... > > > And.. aside from the thoughts I outlined in the email referenced above, > > > there is always the chance people accidentally rely on the strong > > > ordering on their x86 CPU and find things come apart when ran on their > > > ARM/MIPS/etc.. > > > > > > There are a fair number of people who use the raw futex call and we have > > > 0 visibility into many of them. The assumed and accidental ordering > > > guarantees will forever remain a mystery. > > > > > > > Understood. That's truely a potential problem. Considering not all the > > architectures imply a full barrier at user<->kernel boundries, maybe we > > can use one bit in the opcode of the futex system call to indicate > > whether userspace treats futex as fully ordered. Like: > > > > #define FUTEX_ORDER_SEQ_CST 0 > > #define FUTEX_ORDER_RELAXED 64 (bit 7 and bit 8 are already used) > > Not unless there's an actual performance problem with any of this. > Futexes are painful enough as is. Make sense, and we still have choices like modifying the userspace code if there is actually a performance problem ;-) Regards, Boqun
Attachment:
signature.asc
Description: PGP signature