Hi,
On 4/12/19 9:52 AM, Michael Schmitz wrote:
Am 12.04.2019 um 11:03 schrieb Eero Tamminen:
[...]
* Stack is always shown, but call trace following it is always empty.
Is call trace explicitly disabled for m68k task list?
No, must be a 030 thing. The output on 060 does show a call trace (at
least for normal processes).
Ok, that's one more bug.
* Following threads didn't fault:
----------------------------------------------------------------
[31197.540000] task PC stack pid father
[31197.550000] init S 0 1 0 0x00000000
[31197.620000] kthreadd S 0 2 0 0x00000000
[31198.020000] ksoftirqd/0 R running task 0 7 2
[31198.080000] kdevtmpfs S 0 8 2 0x00000000
[31198.280000] oom_reaper S 0 12 2 0x00000000
[31198.760000] kswapd0 S 0 200 2 0x00000000
[31198.950000] jbd2/hda3-8 S 0 794 2 0x00000000
[...]
>
User space processes.
kthreadd and its 5 children above are kernel threads, not user
space processes. Only difference visible from this output
to faulting tasks is the task state.
----------------------------------------------------------------
* Following threads did fault:
----------------------------------------------------------------
[31197.680000] kworker/0:0 I 0 3 2 0x00000000
[31197.750000] Workqueue: (null) (events)[...]
[31200.390000] kworker/u2:2 I 0 1310 2 0x00000000
[31200.460000] Workqueue: (null) (events_unbound)
----------------------------------------------------------------
Kernel tasks.
=> *All* of them are kernel threads (kthreadd children) in 'I' state
('I' = interrupt context?)
Unlikely - may be interruptible sleep.
Looking at sched_show_task() -> task_state_to_char() -> sched.h, "I"
means TASK_IDLE i.e. those kernel threads are both non-interruptible
(same as "D"), and with no load.
[...]
=> I think the problem is that 'I' kthreads have NULL "current_pwq".
Confirmed by the patch you attached so your analysis seems right.
Ones with workqueues just have "current_func" set, others don't.
Why that would affect / fault only on 030?
The 040/060 bus error trap may not force a bus error bypassing
do_page_fault() in the same way the 030 handler does. I haven't yet
looked at the 040/060 handler. Did I mention I really don't do memory
management stuff?
The real question is - why are these fields NULL in the first place? > And are they NULL only on 030?
I'm very interested in this too.
Attached patch fixes the Oops for me.
I guess __probe_kernel_read() was meant to make checking for NULL
pointers obsolete in these functions (where fields may well be NULL
depending on context). I don't think your patch would be accepted, when
a fix in the 030 fault handler does the job just as well.
*If* those fields are NULL also on other arches, going through fault
handler for nearly half of tasks is pretty suboptimal. I.e. that one
extra "if" can also be considered as an optimization for the common
case.
Task list is a debugging feature and it causing page faults won't help
with debugging.
- Eero