Re: preempt-rt: Real Time hogging task

Raz <raziebe@xxxxxxxxx> · Sun, 5 Feb 2012 14:49:51 +0200

I think it would be nice if load_module in ftrace mode will report that
the driver should be recompiled. same as where it screams about version
or license.
So, i managed to use ftrace, problem is that i cannot get the trace
out of the machine,
neither by scp nor by sz. i cannot cp it since it will flush to drive.
( even though i set
flush to RT 99).
Here is part of the trace  file.

      tEthernInp-1173  [000]   133.002614: sub_preempt_count <-unmask_8259A_irq
      tEthernInp-1173  [000]   133.002614: sub_preempt_count <-handle_level_irq
      tEthernInp-1173  [000]   133.002614: irq_exit <-do_IRQ
      tEthernInp-1173  [000]   133.002615: sub_preempt_count <-irq_exit
      tEthernInp-1173  [000]   133.002615: wakeup_softirqd <-irq_exit
      tEthernInp-1173  [000]   133.002615: wake_up_process <-wakeup_softirqd
      tEthernInp-1173  [000]   133.002616: try_to_wake_up <-wake_up_process
      tEthernInp-1173  [000]   133.002616: add_preempt_count <-try_to_wake_up
      tEthernInp-1173  [000]   133.002616: add_preempt_count <-try_to_wake_up
      tEthernInp-1173  [000]   133.002617: activate_task <-try_to_wake_up
      tEthernInp-1173  [000]   133.002617: enqueue_task <-activate_task
      tEthernInp-1173  [000]   133.002617: update_rq_clock <-enqueue_task
      tEthernInp-1173  [000]   133.002618: enqueue_task_rt <-enqueue_task
      tEthernInp-1173  [000]   133.002618: dequeue_rt_stack <-enqueue_task_rt
      tEthernInp-1173  [000]   133.002618: ttwu_do_wakeup.clone.139 <-try_to_wa

On Sun, Feb 5, 2012 at 12:02 PM, Raz <raziebe@xxxxxxxxx> wrote:
> also, this is x86-32 machine with a single Celeron processor 1.5Ghz.
>
> On Sun, Feb 5, 2012 at 11:57 AM, Raz <raziebe@xxxxxxxxx> wrote:
>> Hey Steven
>>
>> This is the first time I try frace on this kernel. it turned out that
>> when enabling
>> ftrace kernel oops whenever one loads a module - see bellow.
>> I debugged it a little and the null pointer is in   trace_module_add_events
>> in the loop , the call pointer is NULL.
>>
>> for_each_event(call, start, end) {
>>   call is null.
>> }
>>
>> I will continue debugging it a bit more.
>>
>>
>>    4.137324] e100 0000:02:03.0: eth0: NIC Link is Up 100 Mbps Full Duplex
>> [    5.180354] Disabling lock debugging due to kernel taint
>> [    5.180792] BUG: unable to handle kernel NULL pointer dereference at   (null)
>> [    5.180799] IP: [<c1084f7c>] trace_module_notify+0x9c/0x160
>> [    5.180810] *pde = 00000000
>> [    5.180813] Oops: 0000 [#1] PREEMPT
>> [    5.180818] Modules linked in: sercos_driver(O+)
>> [    5.180822]
>> [    5.180826] Pid: 1079, comm: modprobe Tainted: G           O
>> 3.2.0-rc5-rt8 #8 N/A  N/A /MODB
>> [    5.180832] EIP: 0060:[<c1084f7c>] EFLAGS: 00010286 CPU: 0
>> [    5.180836] EIP is at trace_module_notify+0x9c/0x160
>> [    5.180839] EAX: de513f44 EBX: dfeecc60 ECX: de513e0c EDX: de513e74
>> [    5.180842] ESI: 00000000 EDI: de513edc EBP: de60df3c ESP: de60df14
>> [    5.180846]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
>> [    5.180849] Process modprobe (pid: 1079, ti=de60c000 task=de519030
>> task.ti=de60c000)
>> [    5.180852] Stack:
>> [    5.180854]  de513e74 de513f44 de513edc de513f44 de513e0c de513e74
>> 7fbb1b30 c14578b0
>> [    5.180859]  00000000 c14563a0 de60df58 c1055ba5 dfeecc60 00000001
>> c1455d50 00000001
>> [    5.180865]  dfeecc60 de60df78 c1055ee3 fffffffd 00000000 00000000
>> dfeecc60 0000223f
>> [    5.180870] Call Trace:
>> [    5.180878]  [<c1055ba5>] notifier_call_chain+0x45/0x60
>> [    5.180883]  [<c1055ee3>] __blocking_notifier_call_chain+0x43/0x70
>> [    5.180888]  [<c1055f2f>] blocking_notifier_call_chain+0x1f/0x30
>> [    5.180893]  [<c1069682>] sys_init_module+0x62/0x1c0
>> [    5.180900]  [<c10c32c1>] ? sys_close+0x71/0xc0
>> [    5.180906]  [<c13441e1>] syscall_call+0x7/0xb
>> [    5.180909] Code: 00 8d 90 44 01 00 00 83 c0 74 89 55 e4 89 45 ec
>> 8d b4 26 00 00 00 00 8b 45 e4 8b 55 ec 89 7c 24 08 8b 4d e8 89 44 24
>> 04 89 14 24
>>
>>
>> On Thu, Feb 2, 2012 at 4:12 PM, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>>> On Thu, 2012-02-02 at 12:52 +0200, Raz wrote:
>>>> hey
>>>>
>>>> I am trying to understand why a user space MT process behaves in an
>>>> unexpected manner.
>>>> I have real time process, which executes most of its threads in RT priority,
>>>> and from time to time a task ( medium priority ) is executing without
>>>> trying to stop.
>>>>
>>>> There are times, that this task is hogging the entire process and no
>>>> other **higher*** rt priority
>>>> task is getting any cpu time.
>>>>
>>>> Linux is **not** getting hogged. when setting the serial console (
>>>> irq/serial ) and its shell ( /bin/sh )
>>>> to a higher priority linux is responsive.
>>>>
>>>> Observing the task list  through ps command i can see that this task's
>>>> threads when
>>>> to un-interruptible sleep.
>>>>
>>>> Any idea why is it happening ?
>>>
>>> Some ideas, but I need a lot more information.
>>>
>>>> kernel is 3.2.0-rc5-rt8.
>>>
>>> Is this a x86 box?
>>>
>>> Can you run a trace on the task that is going berserk?
>>>
>>> Enable CONFIG_FUNCTION_TRACER in your kernel and do the following:
>>>
>>> # mount -t debugfs nodev /sys/kernel/debug
>>> # cd /sys/kernel/debug/tracing
>>> # echo <pid-of-berserk-task> > set_ftrace_pid
>>> # echo function > current_tracer
>>> # cat trace
>>>
>>> Thanks!
>>>
>>> -- Steve
>>>
>>>
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html