Hi Eero,
Am 12.04.2019 um 11:03 schrieb Eero Tamminen:
Hi,
On 4/11/19 5:10 AM, Michael Schmitz wrote:
[...]
OK, I decided to bite the bullet and modify bus_error030() to allow
falling through to do_page_fault if an invalid page read happens while
page faults are disabled.
[...]
Resulting syslog:
[...]
Summary:
* Stack is always shown, but call trace following it is always empty.
Is call trace explicitly disabled for m68k task list?
No, must be a 030 thing. The output on 060 does show a call trace (at
least for normal processes).
* Following threads didn't fault:
----------------------------------------------------------------
[31197.540000] task PC stack pid father
[31197.550000] init S 0 1 0 0x00000000
[31197.620000] kthreadd S 0 2 0 0x00000000
[31198.020000] ksoftirqd/0 R running task 0 7 2
[31198.080000] kdevtmpfs S 0 8 2 0x00000000
[31198.280000] oom_reaper S 0 12 2 0x00000000
[31198.760000] kswapd0 S 0 200 2 0x00000000
[31198.950000] jbd2/hda3-8 S 0 794 2 0x00000000
[31199.120000] portmap S 0 982 1 0x00000000
[31199.180000] syslogd S 0 1070 1 0x00000000
[31199.220000] klogd R running task 0 1076 1
[31199.280000] gpm S 0 1086 1 0x00000000
[31199.350000] inetd S 0 1091 1 0x00000000
[31199.410000] lpd S 0 1095 1 0x00000000
[31199.470000] sshd S 0 1101 1 0x00000000
[31199.520000] rpc.statd S 0 1106 1 0x00000000
[31199.560000] atd S 0 1111 1 0x00000000
[31199.610000] cron S 0 1114 1 0x00000000
[31199.670000] getty S 0 1120 1 0x00000000
[31199.720000] getty S 0 1121 1 0x00000000
[31199.790000] getty S 0 1122 1 0x00000000
[31199.850000] getty S 0 1123 1 0x00000000
[31199.920000] getty S 0 1124 1 0x00000000
[31199.980000] getty S 0 1125 1 0x00000000
[31200.160000] sshd S 0 1304 1101 0x00000000
[31200.220000] sshd S 0 1306 1304 0x00000000
[31200.270000] bash S 0 1307 1306 0x00000000
[31200.330000] bash R running task 0 1308 1307
User space processes.
----------------------------------------------------------------
* Following threads did fault:
----------------------------------------------------------------
[31197.680000] kworker/0:0 I 0 3 2 0x00000000
[31197.750000] Workqueue: (null) (events)
[31197.800000] kworker/0:0H I 0 4 2 0x00000000
[31197.930000] mm_percpu_wq I 0 6 2 0x00000000
[31198.160000] kworker/u2:1 I 0 9 2 0x00000000
[31198.230000] Workqueue: (null) (events_unbound)
[31198.330000] kworker/0:1 I 0 13 2 0x00000000
[31198.450000] writeback I 0 95 2 0x00000000
[31198.510000] Workqueue: (null) (flush-3:0)
[31198.570000] crypto I 0 97 2 0x00000000
[31198.660000] kblockd I 0 99 2 0x00000000
[31198.720000] Workqueue: (null) (kblockd)
[31198.820000] kworker/0:1H I 0 761 2 0x00000000
[31198.890000] Workqueue: (null) (kblockd)
[31199.010000] ext4-rsv-conver I 0 795 2 0x00000000
[31200.050000] kworker/u2:0 I 0 1272 2 0x00000000
[31200.390000] kworker/u2:2 I 0 1310 2 0x00000000
[31200.460000] Workqueue: (null) (events_unbound)
----------------------------------------------------------------
Kernel tasks.
=> *All* of them are kernel threads (kthreadd children) in 'I' state
('I' = interrupt context?)
Unlikely - may be interruptible sleep.
* *There are always two faults*
Looking at the kthread_probe_data() code:
----------------------------------------------------------------
void *kthread_probe_data(struct task_struct *task)
{
struct kthread *kthread = to_kthread(task);
void *data = NULL;
probe_kernel_read(&data, &kthread->data, sizeof(data));
return data;
}
----------------------------------------------------------------
And void print_worker_info() code:
----------------------------------------------------------------
void print_worker_info(const char *log_lvl, struct task_struct *task)
{
work_func_t *fn = NULL;
char name[WQ_NAME_LEN] = { };
char desc[WORKER_DESC_LEN] = { };
struct pool_workqueue *pwq = NULL;
struct workqueue_struct *wq = NULL;
struct worker *worker;
...
worker = kthread_probe_data(task);
...
probe_kernel_read(&fn, &worker->current_func, sizeof(fn));
probe_kernel_read(&pwq, &worker->current_pwq, sizeof(pwq));
probe_kernel_read(&wq, &pwq->wq, sizeof(wq));
probe_kernel_read(name, wq->name, sizeof(name) - 1);
probe_kernel_read(desc, worker->desc, sizeof(desc) - 1);
...
if (fn || name[0] || desc[0]) {
printk("%sWorkqueue: %s %pf", log_lvl, name, fn);
if (strcmp(name, desc))
pr_cont(" (%s)", desc);
pr_cont("\n");
}
}
----------------------------------------------------------------
From the task output, we know that faulting items with workqueue
have empty "name" & "desc", but non-NULL "current_func".
Everything is initialized to NULLs, so if data fetch fails,
values are NULLs.
From backtraces, we know that at least one of the 2 faults is
from probe_kernel_read().
If variable would be NULL:
* task/kthread -> lots of faults
* worker -> 3 faults, two in probe, and one in above function
* current_func / fn -> no issues
* current_pwq / pwq -> 2 faults, one from probe
* wq -> 1 fault in above function
* name/desc -> can't be NULL
=> I think the problem is that 'I' kthreads have NULL "current_pwq".
Confirmed by the patch you attached so your analysis seems right.
Ones with workqueues just have "current_func" set, others don't.
Why that would affect / fault only on 030?
The 040/060 bus error trap may not force a bus error bypassing
do_page_fault() in the same way the 030 handler does. I haven't yet
looked at the 040/060 handler. Did I mention I really don't do memory
management stuff?
The real question is - why are these fields NULL in the first place? And
are they NULL only on 030?
Attached patch fixes the Oops for me.
I guess __probe_kernel_read() was meant to make checking for NULL
pointers obsolete in these functions (where fields may well be NULL
depending on context). I don't think your patch would be accepted, when
a fix in the 030 fault handler does the job just as well.
Cheers,
Michael Schmitz
- Eero