Re: [PATCH v4 for 4.14 1/3] membarrier: Provide register expedited private command

Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> · Thu, 28 Sep 2017 15:29:50 +0000 (UTC)

----- On Sep 28, 2017, at 11:01 AM, Nicholas Piggin npiggin@xxxxxxxxx wrote:

> On Thu, 28 Sep 2017 13:31:36 +0000 (UTC)
> Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote:
> 
>> ----- On Sep 27, 2017, at 9:04 AM, Nicholas Piggin npiggin@xxxxxxxxx wrote:
>> 
>> > On Tue, 26 Sep 2017 20:43:28 +0000 (UTC)
>> > Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote:
>> >   
>> >> ----- On Sep 26, 2017, at 1:51 PM, Mathieu Desnoyers
>> >> mathieu.desnoyers@xxxxxxxxxxxx wrote:
>> >>

[...]

>> Therefore,
>> you end up with the same rq lock disruption as if you would iterate on all
>> online CPUs. If userspace does that in a loop, you end up, in PeterZ's words,
>> with an Insta-DoS.
> 
> I really don't see how that can be true. spinlock by definition is for
> sharing of resources, it's not an insta-DoS just because you take shared
> spinlocks!

[...]

>> 
>> > 
>> > For the powerpc approach, yes there is some controversy about using
>> > runqueue locks even for cpus that we already can interfere with, but I
>> > think we have a lot of options we could look at *after* it ever shows
>> > up as a problem.
>> 
>> The DoS argument from Peter seems to be a strong opposition to grabbing
>> the rq locks.
> 
> Well if I still can't unconvince you, then we should try testing that
> theory.

[ I'll let PeterZ pitch in on this part of the discussion ]

> 
>> 
>> Here is another point in favor of having a register command for the
>> private membarrier: This gives us greater flexibility to improve the
>> kernel scheduler and return-to-userspace barriers if need be in the
>> future.
>> 
>> For instance, I plan to propose a "MEMBARRIER_FLAG_SYNC_CORE" flag
>> that will also provide guarantees about context synchronization of
>> all cores for memory reclaim performed by JIT for the next merge
>> window. So far, the following architectures seems to have the proper
>> core serializing instructions already in place when returning to
>> user-space: x86 (iret), powerpc (rfi), arm32/64 (return from exception,
>> eret), s390/x (lpswe), ia64 (rfi), parisc (issue at least 7 instructions
>> while signing around a bonfire), and mips SMP (eret).
>> 
>> So far, AFAIU, only x86 (eventually going through sysexit), alpha
>> (appears to require an explicit imb), and sparc (explicit flush + 5
>> instructions around similar bonfire as parisc) appear to require special
>> handling.
>> 
>> I therefore plan to use the registration step with a
>> MEMBARRIER_FLAG_SYNC_CORE flag set to set TIF flags and add the
>> required context synchronizing barriers on sched_in() only for
>> processes wishing to use private expedited membarrier.
>> 
>> So I don't see much point in trying to remove that registration step.
> 
> I don't follow you. You are talking about the concept of registering
> intention to use a different function? And the registration API is not
> merged yet?

Yes, I'm talking about requiring processes to invoke membarrier cmd
MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED before they can successfully
invoke membarrier cmd MEMBARRIER_CMD_PRIVATE_EXPEDITED.

> Let me say I'm not completely against the idea of a registration API. But
> don't think registration for this expedited command is necessary.

Given that we have the powerpc lack-of-full-barrier-on-return-to-userspace
case now, and we foresee x86-sysexit, sparc, and alpha also requiring
special treatment when we introduce the MEMBARRIER_FLAG_SYNC_CORE behavior
in the next release, it seems that we'll have a hard time handling
architecture special cases efficiently if we don't expose the registration
API right away.

> 
> But (aside) let's say a tif flag turns out to be a good diea for your
> second case, why not just check the flag in the membarrier sys call and
> do the registration the first time it uses it?

We also considered that option. It's mainly about guaranteeing that
an expedited membarrier command never blocks. If we introduce this
"lazy auto-registration" behavior, we end up blocking the process
at a random point in its execution so we can issue a synchronize_sched().
By exposing an explicit registration, we can control where this delay
occurs, and even allow library constructors to invoke the registration
while the process is a single threaded, therefore allowing us to completely
skip synchronize_sched().

Thanks,

Mathieu

> 
> Thanks,
> Nick

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com