Re: On migrate_disable() and latencies

Nicholas Mc Guire <der.herr@xxxxxxx> · Thu, 28 Jul 2011 09:54:37 +0200

On Wed, 27 Jul 2011, Paul E. McKenney wrote:

> On Wed, Jul 27, 2011 at 01:13:18PM +0200, Peter Zijlstra wrote:
> > On Mon, 2011-07-25 at 14:17 -0700, Paul E. McKenney wrote:
> > 
> > > > I suppose it is indeed. Even for the SoftRT case we need to make sure
> > > > the total utilization loss is indeed acceptable.
> > > 
> > > OK.  If you are doing strict priority, then everything below the highest
> > > priority is workload dependent. 
> > 
> > <snip throttling, that's a whole different thing>
> > 
> > >  The higher-priority
> > > tasks can absolutely starve the lower-priority ones, with or without
> > > the migrate-disable capability.
> > 
> > Sure, that's how FIFO works, but it also relies on the fact that once
> > your high priority task completes the lower priority task resumes.
> > 
> > The extension to SMP is that we run the m highest priority tasks on n
> > cpus ; where m <= n. Any loss in utilization (idle time in this
> > particular case, but irq/preemption/migration and cache overhead are
> > also time not spend on the actual workload.
> > 
> > Now the WCET folks are all about quantifying the needs of applications
> > and the utilization limits of the OS etc. And while for SoftRT you can
> > relax quite a few of the various bounds you still need to know them in
> > order relax them (der Hofrat likes to move from worst case to avg case
> > IIRC).
>

its not about worst case vs. average case its about using the distribution
rather than boundary values - boundary values are hard to correlate with
specific events.

> ;-)
> 
> > > Another way of looking at it is from the viewpoint of the additional
> > > priority-boost events.  If preemption is disabled, the low-priority task
> > > will execute through the preempt-disable region without context switching.
> > > In contrast, given a migration-disable region, the low-priority task
> > > might be preempted and then boosted.  (If I understand correctly, if some
> > > higher-priority task tries to enter the same type of migration-disable
> > > region, it will acquire the associated lock, thus priority-boosting the
> > > task that is already in that region.)
> > 
> > No, there is no boosting involved, migrate_disable() isn't intrinsically
> > tied to a lock or other PI construct. We might needs locks to keep some
> > of the per-cpu crap correct, but that again, is a whole different ball
> > game.
> > 
> > But even if it was, I don't think PI will help any for this, we still
> > need to complete the various migrate_disable() sections, see below.
> 
> OK, got it.  I think, anyway.  I was incorrectly (or at least unhelpfully)
> pulling in locks that might be needed to handle per-CPU variables.
> 
> > > One stupid-but-tractable way to model this is to have an interarrival
> > > rate for the various process priorities, and then calculate the odds of
> > > (1) a higher priority process arriving while the low-priority one is
> > > in a *-disable region and (2) that higher priority process needing to
> > > enter a conflicting *-disable region.  This would give you some measure
> > > of the added boosting load due to migration-disable as compared to
> > > preemption-disable.
> > > 
> > > Would this sort of result be useful?
> > 
> > Yes, such type of analysis can be used, and I guess we can measure
> > various variables related to that.
> 
> OK, good.
> 
> > > > My main worry with all this is that we have these insane long !preempt
> > > > regions in mainline that are now !migrate regions, and thus per all the
> > > > above we could be looking at a substantial utilization loss.
> > > > 
> > > > Alternatively we could all be missing something far more horrid, but
> > > > that might just be my paranoia talking.
> > > 
> > > Ah, good point -- if each migration-disable region is associated with
> > > a lock, then you -could- allow migration and gain better utilization
> > > at the expense of worse caching behavior.  Is that the concern?
> > 
> > I'm not seeing how that would be true, suppose you have this stack of 4
> > migrate_disable() sections and 3 idle cpus, no amount of boosting will
> > make the already running task at the top of the stack go any faster, and
> > it needs to complete the migrate_disable section before it can be
> > migrated, equally so for the rest, so you still need
> > 3*migrate-disable-period of time before all your cpus are busy again.
> > 
> > You can move another task to the top of the stack by boosting, but
> > you'll need 3 tasks to complete their resp migrate-disable section, it
> > doesn't matter which task, so boosting doesn't change anything.
> 
> OK, so let me see if I understand what you are looking to model.
> 
> o	There are no locks.
> 
> o	There are a finite number of tasks with varying priorities.
> 	(I would initially work with a single task per priority
> 	level, but IIRC it is not hard to make multiple tasks per
> 	priority work.  Not a fan of infinite numbers of priorities,
> 	though!)
> 
> o	There are multiple CPUs.
> 
> o	Once a task enters a migrate-disable region, it must remain
> 	on that CPU.  (I will initially model the migrate-disable region
> 	as consuming a fixed amount of CPU.  If I wanted to really wuss
> 	out, I would model it as consuming an exponentially distributed
> 	amount of CPU.)
> 
> o	Tasks awakening outside of migrate-disable regions will pick
> 	the CPU running the lowest-priority task, whether or not this
> 	task is in migrate-disable state.  (At least I don't see
> 	anything in 3.0-rt3 that looks like a scheduling decision
> 	based on ->migrate_disable, perhaps due to blindness.)
>

This might be a simple heuristics to minimize the probability of stacking
in the first place.

> o	For an example, if all CPUs except for one are running prio-99
> 	tasks, and the remaining CPU is running a prio-1 task in
> 	a migrate-disable region, if a prio-2 tasks awakens, it
> 	will preempt the prio-1 task.

all CPUs utilized so no utilization loss at all in that szenario 

> 
> 	On the other hand, if at least one of the CPUs was idle,
> 	the prio-2 task would have instead run on that idle CPU.

so what you need to add to the model is the probability of the transitional
event:

   * prio-2 task preempts prio-1 task because all CPUs are idle
   * atleast one CPU becomes idle while prio-1 task is blocked for migration
     due to migrate-disable + preemted by prio-2 task

only in this combination does the system suffer a utilization penalty.

> 
> o	The transition probabilities depend on the priority
> 	of the currently running migrate-disable CPU -- the higher
> 	that priority, the greater the chance that any preempting
> 	task will find some other CPU instead.
> 
> 	The recurrence times depend on the number of tasks stacked
> 	up in migrate-disable regions on that CPU.
> 
> If this all holds, it would be possible to compute the probability
> of a given migrate-disable region being preempted and if preempted,
> the expected duration of that preemption, given the following
> quantities as input:
> 
> o	The probability that a given CPU is running a task
> 	of priority P for each priority.  The usual way to
> 	estimate this is based on per-thread CPU utilizations.
> 
> o	The expected duration of migrate-disable regions.
> 
> o	The expected wakeups per second for tasks of each priority.
> 
> With the usual disclaimers about cheezy mathematical approximations
> of reality and all that.
> 
> Would this be useful, or am I still missing the point?
>

to get an estimation of the latency impact - but to get a estimate of the
impact on system utilization you would need to include the probability that a 
different CPU is idle in the system and would in principle allow running
one of the tasks that can'b be migrated. As I understood it, the initial 
questions was if migrate_disable has a relevant impact on system utilization
in multicore systems. For this question I guess two of the key parameters are

 * probability that migrate-disable stacking occures 
 * probability of a idle CPU transition while stacking persists

I guess the probability of an idle transition of a CPU is hard to model as it
is very profile specific.

hofrat
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html