Re: [PATCH v8 04/14] task_isolation: add initial support

Andy Lutomirski <luto@xxxxxxxxxxxxxx> · Tue, 20 Oct 2015 14:26:34 -0700

On Tue, Oct 20, 2015 at 2:20 PM, Chris Metcalf <cmetcalf@xxxxxxxxxx> wrote:
> On 10/20/2015 04:56 PM, Andy Lutomirski wrote:
>>
>> On Tue, Oct 20, 2015 at 1:36 PM, Chris Metcalf <cmetcalf@xxxxxxxxxx>
>> wrote:
>>>
>>> +/*
>>> + * In task isolation mode we try to return to userspace only after
>>> + * attempting to make sure we won't be interrupted again.  To handle
>>> + * the periodic scheduler tick, we test to make sure that the tick is
>>> + * stopped, and if it isn't yet, we request a reschedule so that if
>>> + * another task needs to run to completion first, it can do so.
>>> + * Similarly, if any other subsystems require quiescing, we will need
>>> + * to do that before we return to userspace.
>>> + */
>>> +bool _task_isolation_ready(void)
>>> +{
>>> +       WARN_ON_ONCE(!irqs_disabled());
>>> +
>>> +       /* If we need to drain the LRU cache, we're not ready. */
>>> +       if (lru_add_drain_needed(smp_processor_id()))
>>> +               return false;
>>> +
>>> +       /* If vmstats need updating, we're not ready. */
>>> +       if (!vmstat_idle())
>>> +               return false;
>>> +
>>> +       /* If the tick is running, request rescheduling; we're not ready.
>>> */
>>> +       if (!tick_nohz_tick_stopped()) {
>>> +               set_tsk_need_resched(current);
>>> +               return false;
>>> +       }
>>> +
>>> +       return true;
>>> +}
>>
>> I still don't get why this is a loop.
>
>
> You mean, why is this code called from prepare_exit_to_userspace()
> in the loop, instead of after the loop?  It's because the actual functions
> that clean up the LRU, vmstat worker, etc., may need interrupts enabled,
> may reschedule internally, etc.  (refresh_cpu_vm_stats() calls
> cond_resched(), for example.)

Yuck.  I guess that's a reasonable argument, although it could also be fixed.

>  Even more importantly, we rely on
> rescheduling to take care of the fact that the scheduler tick may still
> be running, and therefore loop back to the schedule() call that's run
> when TIF_NEED_RESCHED gets set.

This just seems like a mis-design.  We don't know why the scheduler
tick is on, so we're just going to reschedule until the problem goes
away?

>
>> BTW, should isolation just be a scheduler class (SCHED_ISOLATED)?
>
>
> So a scheduler class is an interesting idea certainly, although not
> one I know immediately how to implement.  I'm not sure whether
> it makes sense to require a user be root or have a suitable rtprio
> rlimit, but perhaps so.  The nice thing about the current patch
> series is that you can affinitize yourself to a nohz_full core and
> declare that you want to run task-isolated, and none of that
> requires root nor really is there a reason it should.

Your patches more or less implement "don't run me unless I'm
isolated".  A scheduler class would be more like "isolate me (and
maybe make me super high priority so it actually happens)".

I'm not a scheduler person, so I don't know.  But "don't run me unless
I'm isolated" seems like a design that will, at best, only ever work
by dumb luck.  You have to disable migration, avoid other runnable
tasks, hope that the kernel keeps working the way it did when you
wrote the patch, hope you continue to get lucky enough that you ever
get to user mode in the first place, etc.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html