On Tue, Feb 15, 2011 at 11:12 AM, Peter LaDow <petela@xxxxxxxxxxxxxxx> wrote: > I made an error in my last post. My call tree wasn't accurate since I > was looking at unpatched code. After applying the RT patch, the call > tree changes a bit: > > timer_interrupt > | > + hrtimer_interrupt > | > + raise_softirq_irqoff > | > + wakeup_softirqd > | > + wake_up_process > | > + try_to_wakeup > > It indeed does offload the timer expirations to the hrtimer softirq. > And the only task that try_to_wakeup works on is the softirq handler. > So this overhead is even less than I thought. Indeed it is quite > light. > > So it seems that I was on track before. The hrtimer softirq task is > running at a priority of 50: > > # ps | grep irq > 10 root 0 SW< [sirq-hrtimer/0] > # chrt -p 10 > pid 10's current scheduling policy: SCHED_FIFO > pid 10's current scheduling priority: 50 > > And I run my program with 'chrt -f 99'. So it does seem that the > hrtimer softirq task should not interfere. > > So I'm back to the scenarios you described earlier. I suppose if the > timers are close in proximity, there would be a flurry of interrupts > frequently occurring. Each of these could in fact slow things down. > So to prevent this deluge, we tried something. We bumped up the > minimum resolution on the decrementer to something closer to 1ms. > This means the decrementer would interrupt us no more often than 1ms. > We modified arch/powerpc/kernel/time.c to set the min_delta_ns of the > decrement to a larger value (large enough to equal about 1ms) rather > than the default 2. The jitter disappeared. Now, I know that doing > this effectively eliminates their use as "high resolution", but it > proves the point that it is the flurry of interrupts causing the > problems. > > So it does seem that it is the interrupt overhead that is the problem. > So if we want high resolution, but low overhead, we have to get > around the problem of lots of tasks using clock_nanosleep. In our > real-world system, we have only 1 high priority task that must run > every 500us. More than 99% of the time, it gets to run and completes > its work very quickly. However, than <1% of the time, it doesn't run > for 1ms to 2ms, breaking our requirements. We have several lower > priority tasks running, each using clock_nanosleep or pending on an > I/O event. It may be in our system that the relatively large number > of timers is occasionally causing a flurry of interrupts increasing > the jitter. So how do we get rid of it? > > I see only 2 ways: 1) stop using clock_nanosleep or 2) stop using > high resolution timers. Implementation of both is problematic. > Eliminating use of clock_nanosleep would require replacing it with > something that didn't resolve to an underlying nanosleep system call, > which I think is impossible (except for using sleep, but that only > gives us 1sec resolution). And turning off the high resolution timers > makes it impossible for us to wake every 500us. You might be able to use range timers to solve your problem: http://lwn.net/Articles/296578/ > > Hmmm....I guess this really is a limitation of our platform. We are > just up against the wall in terms of burden and processing power. > There just isn't enough horsepower to do everything we want at the > time we want. -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html