Thomas,
Thomas Gleixner wrote:
Iratxo,
please do not top post.
Thanks for you answer and sorry for my top-posting.
2.6.29-rt16, right ?
I have also tried rt18, but as I had thought seeing change-logs, the
behavior is exactly the same.
Hmm, so something is marking the timer softirq pending, but does not
wake the timer softirq.
[42949422.940000] NOHZ: local_softirq_pending 02
[42949422.950000] NOHZ: local_softirq_pending 02
[42949422.960000] ISR HIHG LATENCY 30073
[42949596.920000] NOHZ: local_softirq_pending 02
[42949596.930000] NOHZ: local_softirq_pending 02
[42949596.940000] NOHZ: local_softirq_pending 02
[42949596.950000] NOHZ: local_softirq_pending 02
So this repeats itself roughly every 3min and stays that way for a
couple of jiffies. Does your code fiddle with timers ?
There are some timers in the system, but I believe that those are quite
out of the main path, but there could be some problem with delayed
works, as they use also timers there.
And, well, I have managed to avoid this 180s dependency, and make it
happen only under very heavy load. To do do this I have changed HZ from
100 to 1000. Once I do this the problem arises only when the load, for
example heavy pinging the ethernet interface.
[42949596.960000] ISR HIHG LATENCY 40066
The latency is exaclty the number of NOHZ messages * 10ms. I assume
you have HZ=100. Looks like your code is waiting for a timer, which
does not fire due to the local_softirq_pending 02 problem.
You are right, I hadn't noticed this. This pattern also repeats with HZ
1000, (see dmesg excerpt below), but I only see the first 10 NOHZ
because of the ratelimit in that function. Is this right? At least the
time in the printks summed all-together sum up the aprox 50msec delay.
In this case is not the ISR that gets delayed, but a RT task, it depends
on the moment.
[4294750.246000] NOHZ: local_softirq_pending 06
[4294750.246000] NOHZ: local_softirq_pending 06
[4294750.247000] NOHZ: local_softirq_pending 06
[4294750.248000] NOHZ: local_softirq_pending 06
[4294750.249000] NOHZ: local_softirq_pending 06
[4294750.250000] NOHZ: local_softirq_pending 06
[4294750.251000] NOHZ: local_softirq_pending 06
[4294750.252000] NOHZ: local_softirq_pending 06
[4294750.253000] NOHZ: local_softirq_pending 06
[4294750.254000] NOHZ: local_softirq_pending 06
[4294750.296000] SCHED Max time reached 51368
[4294750.296000] SCHED Hi time reached 51368
Do you know what could be causing this issue. I have managed to repeat
this traces (NOHZ...) without using my code, using a workqueue and in
the work just by doing something like:
work_func() {
mdelay(10);
msleep(10);
queue_work(myqueue, mywork);
}
And then by heavy loading the box from the outside.
Does it work when you disable CONFIG_NOHZ ?
I will try this and let the list know.
Thanks,
tglx
Thanks a really lot.
BR,
Iratxo.
--
Iratxo Pichel Ortiz
Software Development Manager
Albentia Systems S.A.
http://www.albentia.com
Tel: +34 914400567
Cel: +34 663808405
Fax: +34 914400569
C\Margarita Salas 22
Parque Tecnológico de Leganés
Leganés (28918)
Madrid
Spain
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html