On Tue, Aug 30, 2016 at 8:32 AM, Chris Metcalf <cmetcalf@xxxxxxxxxxxx> wrote: > On 8/30/2016 3:58 AM, Peter Zijlstra wrote: >> >> On Mon, Aug 29, 2016 at 12:40:32PM -0400, Chris Metcalf wrote: >>> >>> On 8/29/2016 12:33 PM, Peter Zijlstra wrote: >>>> >>>> On Tue, Aug 16, 2016 at 05:19:27PM -0400, Chris Metcalf wrote: >>>>> >>>>> + /* >>>>> + * Request rescheduling unless we are in full dynticks mode. >>>>> + * We would eventually get pre-empted without this, and if >>>>> + * there's another task waiting, it would run; but by >>>>> + * explicitly requesting the reschedule, we may reduce the >>>>> + * latency. We could directly call schedule() here as well, >>>>> + * but since our caller is the standard place where schedule() >>>>> + * is called, we defer to the caller. >>>>> + * >>>>> + * A more substantive approach here would be to use a struct >>>>> + * completion here explicitly, and complete it when we shut >>>>> + * down dynticks, but since we presumably have nothing better >>>>> + * to do on this core anyway, just spinning seems plausible. >>>>> + */ >>>>> + if (!tick_nohz_tick_stopped()) >>>>> + set_tsk_need_resched(current); >>>> >>>> This is broken.. and it would be really good if you don't actually need >>>> to do this. >>> >>> Can you elaborate? We clearly do want to wait until we are in full >>> dynticks mode before we return to userspace. >>> >>> We could do it just in the prctl() syscall only, but then we lose the >>> ability to implement the NOSIG mode, which can be a convenience. >> >> So this isn't spelled out anywhere. Why does this need to be in the >> return to user path? > > > I'm not sure where this should be spelled out, to be honest. I guess > I can add some commentary to the commit message explaining this part. > > The basic idea is just that we don't want to be at risk from the > dyntick getting enabled. Similarly, we don't want to be at risk of a > later global IPI due to lru_add_drain stuff, for example. And, we may > want to add additional stuff, like catching kernel TLB flushes and > deferring them when a remote core is in userspace. To do all of this > kind of stuff, we need to run in the return to user path so we are > late enough to guarantee no further kernel things will happen to > perturb our carefully-arranged isolation state that includes dyntick > off, per-cpu lru cache empty, etc etc. None of the above should need to *loop*, though, AFAIK. -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html