Re: Oopses and invalid addresses under Hatari

Eero Tamminen <oak@xxxxxxxxxxxxxx> · Fri, 12 Apr 2019 02:03:33 +0300

Hi,

On 4/11/19 5:10 AM, Michael Schmitz wrote:
[...]
OK, I decided to bite the bullet and modify bus_error030() to allow 
falling through to do_page_fault if an invalid page read happens while 
page faults are disabled.
[...]
Resulting syslog:
[...]

Summary:

* Stack is always shown, but call trace following it is always empty.
  Is call trace explicitly disabled for m68k task list?

* Following threads didn't fault:
----------------------------------------------------------------
[31197.540000]   task                PC stack   pid father
[31197.550000] init            S    0     1      0 0x00000000
[31197.620000] kthreadd        S    0     2      0 0x00000000
>> [31198.020000] ksoftirqd/0     R  running task        0 7      2
>> [31198.080000] kdevtmpfs       S    0     8      2 0x00000000
>> [31198.280000] oom_reaper      S    0    12      2 0x00000000
>> [31198.760000] kswapd0         S    0   200      2 0x00000000
>> [31198.950000] jbd2/hda3-8     S    0   794      2 0x00000000
>> [31199.120000] portmap         S    0   982      1 0x00000000
>> [31199.180000] syslogd         S    0  1070      1 0x00000000
>> [31199.220000] klogd           R  running task        0 1076      1
>> [31199.280000] gpm             S    0  1086      1 0x00000000
>> [31199.350000] inetd           S    0  1091      1 0x00000000
>> [31199.410000] lpd             S    0  1095      1 0x00000000
>> [31199.470000] sshd            S    0  1101      1 0x00000000
>> [31199.520000] rpc.statd       S    0  1106      1 0x00000000
>> [31199.560000] atd             S    0  1111      1 0x00000000
>> [31199.610000] cron            S    0  1114      1 0x00000000
>> [31199.670000] getty           S    0  1120      1 0x00000000
>> [31199.720000] getty           S    0  1121      1 0x00000000
>> [31199.790000] getty           S    0  1122      1 0x00000000
>> [31199.850000] getty           S    0  1123      1 0x00000000
>> [31199.920000] getty           S    0  1124      1 0x00000000
>> [31199.980000] getty           S    0  1125      1 0x00000000
>> [31200.160000] sshd            S    0  1304   1101 0x00000000
>> [31200.220000] sshd            S    0  1306   1304 0x00000000
>> [31200.270000] bash            S    0  1307   1306 0x00000000
>> [31200.330000] bash            R  running task        0  1308 1307
----------------------------------------------------------------

* Following threads did fault:
----------------------------------------------------------------
[31197.680000] kworker/0:0     I    0     3      2 0x00000000
[31197.750000] Workqueue:    (null) (events)
[31197.800000] kworker/0:0H    I    0     4      2 0x00000000
[31197.930000] mm_percpu_wq    I    0     6      2 0x00000000
[31198.160000] kworker/u2:1    I    0     9      2 0x00000000
[31198.230000] Workqueue:    (null) (events_unbound)
[31198.330000] kworker/0:1     I    0    13      2 0x00000000
[31198.450000] writeback       I    0    95      2 0x00000000
[31198.510000] Workqueue:    (null) (flush-3:0)
[31198.570000] crypto          I    0    97      2 0x00000000
[31198.660000] kblockd         I    0    99      2 0x00000000
[31198.720000] Workqueue:    (null) (kblockd)
[31198.820000] kworker/0:1H    I    0   761      2 0x00000000
[31198.890000] Workqueue:    (null) (kblockd)
[31199.010000] ext4-rsv-conver I    0   795      2 0x00000000
[31200.050000] kworker/u2:0    I    0  1272      2 0x00000000
[31200.390000] kworker/u2:2    I    0  1310      2 0x00000000
>> [31200.460000] Workqueue:    (null) (events_unbound)
----------------------------------------------------------------

=> *All* of them are kernel threads (kthreadd children) in 'I' state
   ('I' = interrupt context?)


* *There are always two faults*

Looking at the kthread_probe_data() code:
----------------------------------------------------------------
void *kthread_probe_data(struct task_struct *task)
{
        struct kthread *kthread = to_kthread(task);
        void *data = NULL;

        probe_kernel_read(&data, &kthread->data, sizeof(data));
        return data;
}
----------------------------------------------------------------

And void print_worker_info() code:
----------------------------------------------------------------
void print_worker_info(const char *log_lvl, struct task_struct *task)
{
        work_func_t *fn = NULL;
        char name[WQ_NAME_LEN] = { };
        char desc[WORKER_DESC_LEN] = { };
        struct pool_workqueue *pwq = NULL;
        struct workqueue_struct *wq = NULL;
        struct worker *worker;
...
        worker = kthread_probe_data(task);
...
        probe_kernel_read(&fn, &worker->current_func, sizeof(fn));
        probe_kernel_read(&pwq, &worker->current_pwq, sizeof(pwq));
        probe_kernel_read(&wq, &pwq->wq, sizeof(wq));
        probe_kernel_read(name, wq->name, sizeof(name) - 1);
        probe_kernel_read(desc, worker->desc, sizeof(desc) - 1);
...
        if (fn || name[0] || desc[0]) {
                printk("%sWorkqueue: %s %pf", log_lvl, name, fn);
                if (strcmp(name, desc))
                        pr_cont(" (%s)", desc);
                pr_cont("\n");
        }
}
----------------------------------------------------------------

From the task output, we know that faulting items with workqueue
have empty "name" & "desc", but non-NULL "current_func".

Everything is initialized to NULLs, so if data fetch fails,
values are NULLs.

From backtraces, we know that at least one of the 2 faults is
from probe_kernel_read().

If variable would be NULL:
* task/kthread -> lots of faults
* worker -> 3 faults, two in probe, and one in above function
* current_func / fn -> no issues
* current_pwq / pwq -> 2 faults, one from probe
* wq -> 1 fault in above function
* name/desc -> can't be NULL

=> I think the problem is that 'I' kthreads have NULL "current_pwq".

Ones with workqueues just have "current_func" set, others don't.

Why that would affect / fault only on 030?


Attached patch fixes the Oops for me.


	- Eero

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ddee541ea97a..ec4127c0f3da 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4582,8 +4582,11 @@ void print_worker_info(const char *log_lvl, struct task_struct *task)
 	 */
 	probe_kernel_read(&fn, &worker->current_func, sizeof(fn));
 	probe_kernel_read(&pwq, &worker->current_pwq, sizeof(pwq));
-	probe_kernel_read(&wq, &pwq->wq, sizeof(wq));
-	probe_kernel_read(name, wq->name, sizeof(name) - 1);
+	/* current_pwq is NULL for 030 'I' tasks, and this would fault 2x */
+	if (pwq) {
+		probe_kernel_read(&wq, &pwq->wq, sizeof(wq));
+		probe_kernel_read(name, wq->name, sizeof(name) - 1);
+	}
 	probe_kernel_read(desc, worker->desc, sizeof(desc) - 1);
 
 	if (fn || name[0] || desc[0]) {