Re: Interrupt Bottom Half Scheduling

Frank Rowand <frank.rowand@xxxxxxxxx> · Tue, 15 Feb 2011 11:35:34 -0800

On Tue, Feb 15, 2011 at 11:12 AM, Peter LaDow <petela@xxxxxxxxxxxxxxx> wrote:
> I made an error in my last post.  My call tree wasn't accurate since I
> was looking at unpatched code.  After applying the RT patch, the call
> tree changes a bit:
>
> timer_interrupt
>  |
>  + hrtimer_interrupt
>     |
>     + raise_softirq_irqoff
>        |
>       + wakeup_softirqd
>             |
>            + wake_up_process
>                   |
>                  + try_to_wakeup
>
> It indeed does offload the timer expirations to the hrtimer softirq.
> And the only task that try_to_wakeup works on is the softirq handler.
> So this overhead is even less than I thought.  Indeed it is quite
> light.
>
> So it seems that I was on track before.  The hrtimer softirq task is
> running at a priority of 50:
>
> # ps | grep irq
>   10 root         0 SW<  [sirq-hrtimer/0]
> # chrt -p 10
> pid 10's current scheduling policy: SCHED_FIFO
> pid 10's current scheduling priority: 50
>
> And I run my program with 'chrt -f 99'.  So it does seem that the
> hrtimer softirq task should not interfere.
>
> So I'm back to the scenarios you described earlier.  I suppose if the
> timers are close in proximity, there would be a flurry of interrupts
> frequently occurring.  Each of these could in fact slow things down.
> So to prevent this deluge, we tried something.  We bumped up the
> minimum resolution on the decrementer to something closer to 1ms.
> This means the decrementer would interrupt us no more often than 1ms.
> We modified arch/powerpc/kernel/time.c to set the min_delta_ns of the
> decrement to a larger value (large enough to equal about 1ms) rather
> than the default 2.  The jitter disappeared.  Now, I know that doing
> this effectively eliminates their use as "high resolution", but it
> proves the point that it is the flurry of interrupts causing the
> problems.
>
> So it does seem that it is the interrupt overhead that is the problem.
>  So if we want high resolution, but low overhead, we have to get
> around the problem of lots of tasks using clock_nanosleep.  In our
> real-world system, we have only 1 high priority task that must run
> every 500us.  More than 99% of the time, it gets to run and completes
> its work very quickly.  However, than <1% of the time, it doesn't run
> for 1ms to 2ms, breaking our requirements.  We have several lower
> priority tasks running, each using clock_nanosleep or pending on an
> I/O event.  It may be in our system that the relatively large number
> of timers is occasionally causing a flurry of interrupts increasing
> the jitter.  So how do we get rid of it?
>
> I see only 2 ways:  1) stop using clock_nanosleep or 2) stop using
> high resolution timers.  Implementation of both is problematic.
> Eliminating use of clock_nanosleep would require replacing it with
> something that didn't resolve to an underlying nanosleep system call,
> which I think is impossible (except for using sleep, but that only
> gives us 1sec resolution).  And turning off the high resolution timers
> makes it impossible for us to wake every 500us.

You might be able to use range timers to solve your problem:

   http://lwn.net/Articles/296578/

>
> Hmmm....I guess this really is a limitation of our platform.  We are
> just up against the wall in terms of burden and processing power.
> There just isn't enough horsepower to do everything we want at the
> time we want.
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html