Re: [RFC PATCH v7 1/7] Restartable sequences system call

Boqun Feng <boqun.feng@xxxxxxxxx> · Thu, 4 Aug 2016 12:27:10 +0800

On Wed, Aug 03, 2016 at 09:37:57AM -0700, Andy Lutomirski wrote:
> On Wed, Aug 3, 2016 at 5:27 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > On Tue, Jul 26, 2016 at 03:02:19AM +0000, Mathieu Desnoyers wrote:
> >> We really care about preemption here. Every migration implies a
> >> preemption from a user-space perspective. If we would only care
> >> about keeping the CPU id up-to-date, hooking into migration would be
> >> enough. But since we want atomicity guarantees for restartable
> >> sequences, we need to hook into preemption.
> >
> >> It allows user-space to perform update operations on per-cpu data without
> >> requiring heavy-weight atomic operations.
> >
> > Well, a CMPXCHG without LOCK prefix isn't all that expensive on x86.
> >
> > It is however on PPC and possibly other architectures, so in name of
> > simplicity supporting only the one variant makes sense.
> >
> 
> I wouldn't want to depend on CMPXCHG.  But imagine we had primitives
> that were narrower than the full abort-on-preemption primitive.
> Specifically, suppose we had abort if (actual cpu != expected_cpu ||
> *aptr != aval).  We could do things like:
> 
> expected_cpu = cpu;
> aval = NULL;  // disarm for now
> begin();
> aval = event_count[cpu] + 1;
> event_count[cpu] = aval;
> event_count[cpu]++;

This line is redundant, right? Because it will guarantee a failure even
in no-contention cases.

> 
> ... compute something ...
> 
> // arm the rest of it
> aptr = &event_count[cpu];
> if (*aptr != aval)
>   goto fail;
> 
> *thing_im_writing = value_i_computed;
> end();
> 
> The idea here is that we don't rely on the scheduler to increment the
> event count at all, which means that we get to determine the scope of
> what kinds of access conflicts we care about ourselves.
> 

If we increase the event count in userspace, how could we prevent two
userspace threads from racing on the event_count[cpu] field? For
example:

	CPU 0
	================
	{event_count[0] is initially 0}

	[Thread 1]
	begin();
	aval = event_count[cpu] + 1; // 1

	(preempted)
	[Thread 2]
	begin();
	aval = event_count[cpu] + 1; // 1, too
	event_count[cpu] = aval; // event_count[0] is 1

	(preempted)
	[Thread 1]
	event_count[cpu] = aval; // event_count[0] is 1

	... 

	aptr = &event_count[cpu];
	if (*aptr != aval) // false.
		...

	[Thread 2]
	aptr = &event_count[cpu];
	if (*aptr != aval) // false.
		...

, in which case, both the critical sections are successful, and Thread 1
and Thread 2 will race on *thing_im_writing.

Am I missing your point here?

Regards,
Boqun

> This has an obvious downside: it's more complicated.
> 
> It has several benefits, I think.  It's debuggable without hassle
> (unless someone, accidentally or otherwise, sets aval incorrectly).
> It also allows much longer critical sections to work well, as merely
> being preempted in the middle won't cause an abort any more.
> 
> So I'm hoping to understand whether we could make something like this
> work.  This whole thing is roughly equivalent to abort-if-migrated
> plus an atomic "if (*aptr == aval) *b = c;" operation.
> 
> (I think that, if this worked, we could improve it a bit by making the
> abort operation jump back to the "if (*aptr != aval) goto fail;" code,
> which should reduce the scope for error a bit and also reduces the
> need for extra code paths that only execute on an abort.)
Attachment:
signature.asc

Description: PGP signature