Re: On migrate_disable() and latencies

"Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> · Fri, 22 Jul 2011 17:39:34 -0700

On Fri, Jul 22, 2011 at 12:19:52PM +0200, Peter Zijlstra wrote:
> On Wed, 2011-07-20 at 02:37 +0200, Thomas Gleixner wrote:
> >    - Twist your brain around the schedulability impact of the
> >      migrate_disable() approach.
> > 
> >      A really interesting research topic for our friends from the
> >      academic universe. Relevant and conclusive (even short notice)
> >      papers and/or talks on that topic have a reserved slot in the
> >      Kernel developers track at the Realtime Linux Workshop in Prague
> >      in October this year. 
> 
> >From what I can tell it can induce a latency in the order of
> max-migrate-disable-period * nr-cpus.
> 
> The scenario is on where you stack N migrate-disable tasks on a run
> queue (necessarily of increasing priority). Doing this requires all cpus
> in the system to be as busy, for otherwise the task would simply be
> moved to another cpu.
> 
> Anyway, once you manage to stack these migrate-disable tasks, all other
> tasks go to sleep, leaving a vacuum. Normally we would migrate tasks to
> fill the vacuum left by the tasks going to sleep, but clearly
> migrate-disable prohibits this.
> 
> So we have this stack of migrate-disable tasks and M-1 idle cpus (loss
> of utilization). Now it takes the length of the migrate-disable region
> of the highest priority task on the stack (the one running) to complete
> and enable migration again. This will instantly move the task away to an
> idle cpu. This will then need to happen min(N-1, M-1) times before the
> lowest priority migrate_disable task can run again or all cpus are busy.
> 
> Therefore the worst case latency is in the order of
> max-migrate-disable-period * nr-cpus.

OK, but wouldn't that be the latency as seen be the lowest-priority
task?  Or are migrate-disable tasks given preferential treatment?
If not, a prio-99 task would get the same latency either way, right?

Migration-disable can magnify the latency seen by low-priority tasks, if
I understand correctly.  If you disabled preemption, when a low-priority
task became runnable, it would find an idle CPU.  But with migration
disable, the lowest-priority task might enter a migration-disable region,
then be preempted by a marginally higher-priority task that also enters
a migration-diable region, and is also preempted, and so on.  The
lowest-priority task cannot run on the current CPU because of all
the higher-priority tasks, and cannot migrate due to being in a
migration-disable section.

In other words, as is often the case, better worst-case service to
the high-priority tasks can multiply the latency seen by the
low-priority tasks.

So is the topic to quantify this?  If so, my take is that the latency
to the highest-priority task decreases by an amount roughly equal to
the duration of the longest preempt_disable() region that turned into a
migration-disable region, while that to the lowest-priority task increases
by N-1 times the CPU overhead of the longest migration-disable region,
plus context switches.  (Yes, this is a very crude rule-of-thumb model.
A real model would have much higher mathematics and might use a more
detailed understanding of the workload.)

Or am I misunderstanding how all this works?

							Thanx, Paul

> Currently we have no means of measuring these latencies, this is
> something we need to grow, I think Steven can fairly easy craft a
> migrate_disable runtime tracer -- it needs to use t->se.sum_exec_runtime
> for measure so as to only count the actual time spend on the task and
> ignore any time it was blocked.
> 
> Once we have this, its back to the old game of 'lock'-breaking.
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html