On Fri, 12 Jun 2020 13:01:22 +0200 Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> wrote: > + ACPI in case the ACPI folks see something obvious. [...] >> The "acpi_os_execute_deferred" messages were repeated many times in the >> above line, then every 20-30 seconds again for several minutes. Then >> suddenly a call trace appeared which was similar but not identical to >> the one I posted upthread, and each line of the trace was followed by >> the line ", acpi_os_execute_deferred". This went by quite quickly even >> with the printk_delay I added, and I was unable to photograph the start >> of it and couldn't get all of the subsequent output, but the screenshots >> show some of it. After several minutes of this output I again did a >> hard reboot. > > All good. I though that you will have one worker that is blocking but > you have a lot of them. It appears that one is active and stuck and more > are waiting. > > Could you look at acpi in /proc/interrupts 10 secs apart to see if it > increments? > > grep -E 'acpi|smbus' /proc/interrupts I tried this several times, 10 seconds apart and longer, but saw no change, it was always this: steve [ ~ ]$ grep -E 'acpi|smbus' /proc/interrupts 9: 0 5 0 0 0 0 0 0 0 0 0 0 IO-APIC 9-fasteoi acpi 16: 0 0 0 0 0 0 0 0 0 0 0 0 IO-APIC 16-fasteoi i801_smbus > You could also do "echo t > /proc/sysrq-trigger" which gives you a lot > of task state information, but at the end you will also see "howing busy > workqueues and worker pools:" regarding the workqueue state. I am > curious to see if you already have worker stuck in kacpid_notify with > acpi_os_execute_deferred. What am I supposed to do after "echo t > /proc/sysrq-trigger"? Both before and after doing that I get an error trying to open the file: root [ ~ ]# cat /proc/sysrq-trigger cat: /proc/sysrq-trigger: Input/output error > Now that we know that know that acpi_os_execute_deferred() is stuck, > lets shed some light in what it is trying to do. The patch at the end > will dump this information into the console buffer (The `dmesg' command > will print the whole kernel buffer). I have no idea if this starts > printing while the system is running or during shutdown. I would expect > to see the Start line in acpi_os_execute_deferred() but not the End one. > > diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c > index 41168c027a5a4..0e983c558bcb5 100644 > --- a/drivers/acpi/osl.c > +++ b/drivers/acpi/osl.c > @@ -840,7 +840,9 @@ static void acpi_os_execute_deferred(struct work_struct *work) > { > struct acpi_os_dpc *dpc = container_of(work, struct acpi_os_dpc, work); > > + pr_err("%s(%d) Start %px %pF(%px)\n", __func__, __LINE__, dpc, dpc->function, dpc->context); > dpc->function(dpc->context); > + pr_err("%s(%d) End %px %pF(%px)\n", __func__, __LINE__, dpc, dpc->function, dpc->context); > kfree(dpc); > } > > @@ -1096,6 +1098,8 @@ acpi_status acpi_os_execute(acpi_execute_type type, > */ > if (type == OSL_NOTIFY_HANDLER) { > queue = kacpi_notify_wq; > + pr_err("%s(%d) Adding %pS %px\n", __func__, __LINE__, function, > + context); > INIT_WORK(&dpc->work, acpi_os_execute_deferred); > } else if (type == OSL_GPE_HANDLER) { > queue = kacpid_wq; I applied this patch to 5.6.4 and recompiled, and on the next boot with that kernel, the kernel buffer (and kernel and system logs) began to get flooded with these messages: Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.943987] acpi_os_execute_deferred(843) Start ffff8fb82c7b6500 000000003edf1e05(ffff8fb82c492990) Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.944102] acpi_os_execute(1101) Adding acpi_ev_asynch_enable_gpe+0x0/0x2f ffff8fb82c492990 Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.944104] acpi_os_execute_deferred(845) End ffff8fb82c7b6500 000000003edf1e05(ffff8fb82c492990) Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.944105] acpi_os_execute_deferred(843) Start ffff8fb82b844800 000000002ba560ea(ffff8fb82c492990) Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.944124] acpi_os_execute_deferred(845) End ffff8fb82b844800 000000002ba560ea(ffff8fb82c492990) Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.944288] acpi_os_execute_deferred(843) Start ffff8fb82c7b6540 000000003edf1e05(ffff8fb82c492990) Jun 14 10:37:13 strobe-jhalfs kernel: [ 5.944387] acpi_os_execute(1101) Adding acpi_ev_asynch_enable_gpe+0x0/0x2f ffff8fb82c492990 and so on without stopping. I could start X and avoid seeing the messages, but was afraid the logs would fill up the root partition if I let it keep going, so I rebooted with another kernel. Was this message flood because I booted with "ignore_loglevel initcall_debug"? In the logs there are also lots of messages like this: Jun 14 10:37:13 strobe-jhalfs kernel: [ 6.237628] initcall i915_init+0x0/0x66 returned 0 after 288657 usecs Jun 14 10:37:13 strobe-jhalfs kernel: [ 6.237644] calling cn_proc_init+0x0/0x35 @ 1 Jun 14 10:37:13 strobe-jhalfs kernel: [ 6.237646] initcall cn_proc_init+0x0/0x35 returned 0 after 0 usecs Jun 14 10:37:13 strobe-jhalfs kernel: [ 6.237648] calling _nvm_misc_init+0x0/0xc @ 1 Jun 14 10:37:13 strobe-jhalfs kernel: [ 6.237687] initcall _nvm_misc_init+0x0/0xc returned 0 after 36 usecs Jun 14 10:37:13 strobe-jhalfs kernel: [ 6.237690] calling topology_sysfs_init+0x0/0x30 @ 1 Jun 14 10:37:13 strobe-jhalfs kernel: [ 6.237720] initcall topology_sysfs_init+0x0/0x30 returned 0 after 28 usecs and so on. Steve Berman