On Wed, May 2, 2018 at 9:03 AM Mathieu Desnoyers < mathieu.desnoyers@xxxxxxxxxxxx> wrote: > ----- On May 1, 2018, at 11:53 PM, Daniel Colascione dancol@xxxxxxxxxx wrote: > [...] > > > > I think a small enhancement to rseq would let us build a perfect userspace > > mutex, one that spins on lock-acquire only when the lock owner is running > > and that sleeps otherwise, freeing userspace from both specifying ad-hoc > > spin counts and from trying to detect situations in which spinning is > > generally pointless. > > > > It'd work like this: in the per-thread rseq data structure, we'd include a > > description of a futex operation for the kernel would perform (in the > > context of the preempted thread) upon preemption, immediately before > > schedule(). If the futex operation itself sleeps, that's no problem: we > > will have still accomplished our goal of running some other thread instead > > of the preempted thread. > Hi Daniel, > I agree that the problem you are aiming to solve is important. Let's see > what prevents the proposed rseq implementation from doing what you envision. > The main issue here is touching userspace immediately before schedule(). > At that specific point, it's not possible to take a page fault. In the proposed > rseq implementation, we get away with it by raising a task struct flag, and using > it in a return to userspace notifier (where we can actually take a fault), where > we touch the userspace TLS area. > If we can find a way to solve this limitation, then the rest of your design > makes sense to me. Thanks for taking a look! Why couldn't we take a page fault just before schedule? The reason we can't take a page fault in atomic context is that doing so might call schedule. Here, we're about to call schedule _anyway_, so what harm does it do to call something that might call schedule? If we schedule via that call, we can skip the manual schedule we were going to perform. -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html