On Thu, 22 Sep 2016, Waiman Long wrote: > On 09/22/2016 04:38 PM, Thomas Gleixner wrote: > > > wait-wake futex PI futex TO futex > > > --------------- -------- -------- > > > max time 3.49s 50.91s 2.65s > > > min time 3.24s 50.84s 0.07s > > > average time 3.41s 50.90s 1.84s > > > sys time 7m22.4s 55.73s 2m32.9s > > That's really interesting. Do you have any explanation for this massive > > system time differences? > > For the wait-wake futexes, the waiters will be kicked out with EAGAIN without > sleeping as long as the futex value changes before they acquire the hash > bucket spinlock. These waiters will attempt to acquire the lock again in the > user space. If that fails, they will call FUTEX_WAIT again. > > In the above test case, the critical section is pretty short and about 4/5 of > the FUTEX_WAIT calls resulted in an EAGAIN error. So there are a lot more > user-space/kernel transitions than the TO futexes. I think the difference in > the number of FUTEX_WAIT calls vs. the FUTEX_LOCK_TO calls causes the > difference between their sys times. Ok. > As for the PI futex, it is a strictly sleeping lock and so won't use that > much sys time. That's clear. > > > Locking was done mostly by lock stealing. This is where most of the > > > performance benefit comes from, not optimistic spinning. > > How does the lock latency distribution of all this look like and how fair > > is the whole thing? > > The TO futexes are unfair as can be seen from the min/max thread times listed > above. It took the fastest thread 0.07s to complete all the locking > operations, whereas the slowest one needed 2.65s. However, the situation > reverses when I changed the critical section to a 1us sleep. In this case, 1us sleep is going to add another syscall and therefor scheduling, so what? Or did you just extend the critical section busy time? > there will be no optimistic spinning. The performance results for 100k locking > operations were listed below. > > wait-wake futex PI futex TO futex > --------------- -------- -------- > max time 0.06s 9.32s 4.76s ^^^^ ???? > min time 5.59s 9.36s 5.62s > average time 3.25s 9.35s 5.41s > > In this case, the TO futexes are fairer but perform worse than the wait-wake > futexes. That is because the lock handoff mechanism limit the amount of lock > stealing in the TO futexes while the wait-wake futexes have no such > restriction. When I disabled lock handoff, the TO futexes would then perform > similar to the wait-wake futexes. So the benefit of these new fangled futexes is only there for extreme short critical sections and a gazillion of threads fighting for the same futex, right? I really wonder how the average programmer should pick the right flavour, not to talk about any useful decision for something like glibc to pick the proper one. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html