Re: NOHZ: local_softirq_pending

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thomas,

Thomas Gleixner wrote:
Iratxo,

please do not top post.
Thanks for you answer and sorry for my top-posting.
  2.6.29-rt16, right ?
I have also tried rt18, but as I had thought seeing change-logs, the behavior is exactly the same.

Hmm, so something is marking the timer softirq pending, but does not
wake the timer softirq.

[42949422.940000] NOHZ: local_softirq_pending 02
[42949422.950000] NOHZ: local_softirq_pending 02
[42949422.960000] ISR HIHG LATENCY 30073
[42949596.920000] NOHZ: local_softirq_pending 02
[42949596.930000] NOHZ: local_softirq_pending 02
[42949596.940000] NOHZ: local_softirq_pending 02
[42949596.950000] NOHZ: local_softirq_pending 02

So this repeats itself roughly every 3min and stays that way for a
couple of jiffies. Does your code fiddle with timers ?


There are some timers in the system, but I believe that those are quite out of the main path, but there could be some problem with delayed works, as they use also timers there.

And, well, I have managed to avoid this 180s dependency, and make it happen only under very heavy load. To do do this I have changed HZ from 100 to 1000. Once I do this the problem arises only when the load, for example heavy pinging the ethernet interface.
[42949596.960000] ISR HIHG LATENCY 40066

The latency is exaclty the number of NOHZ messages * 10ms. I assume
you have HZ=100. Looks like your code is waiting for a timer, which
does not fire due to the local_softirq_pending 02 problem.
You are right, I hadn't noticed this. This pattern also repeats with HZ 1000, (see dmesg excerpt below), but I only see the first 10 NOHZ because of the ratelimit in that function. Is this right? At least the time in the printks summed all-together sum up the aprox 50msec delay. In this case is not the ISR that gets delayed, but a RT task, it depends on the moment.

[4294750.246000] NOHZ: local_softirq_pending 06
[4294750.246000] NOHZ: local_softirq_pending 06
[4294750.247000] NOHZ: local_softirq_pending 06
[4294750.248000] NOHZ: local_softirq_pending 06
[4294750.249000] NOHZ: local_softirq_pending 06
[4294750.250000] NOHZ: local_softirq_pending 06
[4294750.251000] NOHZ: local_softirq_pending 06
[4294750.252000] NOHZ: local_softirq_pending 06
[4294750.253000] NOHZ: local_softirq_pending 06
[4294750.254000] NOHZ: local_softirq_pending 06
[4294750.296000] SCHED Max time reached 51368
[4294750.296000] SCHED Hi time reached 51368

Do you know what could be causing this issue. I have managed to repeat this traces (NOHZ...) without using my code, using a workqueue and in the work just by doing something like:

work_func() {
   mdelay(10);
   msleep(10);

  queue_work(myqueue, mywork);
}

And then by heavy loading the box from the outside.
Does it work when you disable CONFIG_NOHZ ?
I will try this and let the list know.
Thanks,

	tglx

Thanks a really lot.

BR,

   Iratxo.


--
Iratxo Pichel Ortiz
Software Development Manager

Albentia Systems S.A.
http://www.albentia.com

Tel: +34 914400567
Cel: +34 663808405
Fax: +34 914400569

C\Margarita Salas 22
Parque Tecnológico de Leganés
Leganés (28918)
Madrid
Spain

--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux