On Wed, Aug 3, 2016 at 10:03 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: > On Wed, Aug 3, 2016 at 9:27 PM, Boqun Feng <boqun.feng@xxxxxxxxx> wrote: >> On Wed, Aug 03, 2016 at 09:37:57AM -0700, Andy Lutomirski wrote: >>> On Wed, Aug 3, 2016 at 5:27 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: >>> > On Tue, Jul 26, 2016 at 03:02:19AM +0000, Mathieu Desnoyers wrote: >>> >> We really care about preemption here. Every migration implies a >>> >> preemption from a user-space perspective. If we would only care >>> >> about keeping the CPU id up-to-date, hooking into migration would be >>> >> enough. But since we want atomicity guarantees for restartable >>> >> sequences, we need to hook into preemption. >>> > >>> >> It allows user-space to perform update operations on per-cpu data without >>> >> requiring heavy-weight atomic operations. >>> > >>> > Well, a CMPXCHG without LOCK prefix isn't all that expensive on x86. >>> > >>> > It is however on PPC and possibly other architectures, so in name of >>> > simplicity supporting only the one variant makes sense. >>> > >>> >>> I wouldn't want to depend on CMPXCHG. But imagine we had primitives >>> that were narrower than the full abort-on-preemption primitive. >>> Specifically, suppose we had abort if (actual cpu != expected_cpu || >>> *aptr != aval). We could do things like: >>> >>> expected_cpu = cpu; >>> aval = NULL; // disarm for now >>> begin(); >>> aval = event_count[cpu] + 1; >>> event_count[cpu] = aval; >>> event_count[cpu]++; >> >> This line is redundant, right? Because it will guarantee a failure even >> in no-contention cases. >> >>> >>> ... compute something ... >>> >>> // arm the rest of it >>> aptr = &event_count[cpu]; >>> if (*aptr != aval) >>> goto fail; >>> >>> *thing_im_writing = value_i_computed; >>> end(); >>> >>> The idea here is that we don't rely on the scheduler to increment the >>> event count at all, which means that we get to determine the scope of >>> what kinds of access conflicts we care about ourselves. >>> >> >> If we increase the event count in userspace, how could we prevent two >> userspace threads from racing on the event_count[cpu] field? For >> example: >> >> CPU 0 >> ================ >> {event_count[0] is initially 0} >> >> [Thread 1] >> begin(); >> aval = event_count[cpu] + 1; // 1 >> >> (preempted) >> [Thread 2] >> begin(); >> aval = event_count[cpu] + 1; // 1, too >> event_count[cpu] = aval; // event_count[0] is 1 >> > > You're right :( This would work with an xadd instruction, but that's > very slow and doesn't exist on most architectures. It could also work > if we did: Thinking about this slightly more, maybe it does work. We could use basically the same mechanism to allow the kernel to restart if the specific sequence: aval = event_count[cpu] + 1 event_count[cpu] = avall gets preempted by setting aptr = &event_count[cpu] and aval to event_count[cpu], like this (although I might have screwed up any number of small details): aptr = &event_count[cpu]; barrier(); aval = event_count[cpu]; barrier(); tmp = aval + 1; event_count[cpu] = tmp; /* preemption here will cause an unnecessary retry, but that's okay */ aval = tmp; --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html