On Wed, 2024-03-13 at 11:16 +1300, Hamish Martin wrote: > There is an issue with ACPI overlay table removal specifically > related > to I2C multiplexers. > > Consider an ACPI SSDT Overlay that defines a PCA9548 I2C mux on an > existing I2C bus. When this table is loaded we see the creation of a > device for the overall PCA9548 chip and 8 further devices - one > i2c_adapter each for the mux channels. These are all bound to their > ACPI equivalents via an eventual invocation of acpi_bind_one(). > > When we unload the SSDT overlay we run into the problem. The ACPI > devices are deleted as normal via acpi_device_del_work_fn() and the > acpi_device_del_list. > > However, the following warning and stack trace is output as the > deletion does not go smoothly: > ------------[ cut here ]------------ > kernfs: can not remove 'physical_node', no directory > WARNING: CPU: 1 PID: 11 at fs/kernfs/dir.c:1674 > kernfs_remove_by_name_ns+0xb9/0xc0 > Modules linked in: > CPU: 1 PID: 11 Comm: kworker/u128:0 Not tainted 6.8.0-rc6+ #1 > Hardware name: congatec AG conga-B7E3/conga-B7E3, BIOS 5.13 > 05/16/2023 > Workqueue: kacpi_hotplug acpi_device_del_work_fn > RIP: 0010:kernfs_remove_by_name_ns+0xb9/0xc0 > Code: e4 00 48 89 ef e8 07 71 db ff 5b b8 fe ff ff ff 5d 41 5c 41 5d > e9 a7 55 e4 00 0f 0b eb a6 48 c7 c7 f0 38 0d 9d e8 97 0a d5 ff <0f> > 0b eb dc 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 > RSP: 0018:ffff9f864008fb28 EFLAGS: 00010286 > RAX: 0000000000000000 RBX: ffff8ef90a8d4940 RCX: 0000000000000000 > RDX: ffff8f000e267d10 RSI: ffff8f000e25c780 RDI: ffff8f000e25c780 > RBP: ffff8ef9186f9870 R08: 0000000000013ffb R09: 00000000ffffbfff > R10: 00000000ffffbfff R11: ffff8f000e0a0000 R12: ffff9f864008fb50 > R13: ffff8ef90c93dd60 R14: ffff8ef9010d0958 R15: ffff8ef9186f98c8 > FS: 0000000000000000(0000) GS:ffff8f000e240000(0000) > knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f48f5253a08 CR3: 00000003cb82e000 CR4: 00000000003506f0 > Call Trace: > <TASK> > ? kernfs_remove_by_name_ns+0xb9/0xc0 > ? __warn+0x7c/0x130 > ? kernfs_remove_by_name_ns+0xb9/0xc0 > ? report_bug+0x171/0x1a0 > ? handle_bug+0x3c/0x70 > ? exc_invalid_op+0x17/0x70 > ? asm_exc_invalid_op+0x1a/0x20 > ? kernfs_remove_by_name_ns+0xb9/0xc0 > ? kernfs_remove_by_name_ns+0xb9/0xc0 > acpi_unbind_one+0x108/0x180 > device_del+0x18b/0x490 > ? srso_return_thunk+0x5/0x5f > ? srso_return_thunk+0x5/0x5f > device_unregister+0xd/0x30 > i2c_del_adapter.part.0+0x1bf/0x250 > i2c_mux_del_adapters+0xa1/0xe0 > i2c_device_remove+0x1e/0x80 > device_release_driver_internal+0x19a/0x200 > bus_remove_device+0xbf/0x100 > device_del+0x157/0x490 > ? __pfx_device_match_fwnode+0x10/0x10 > ? srso_return_thunk+0x5/0x5f > device_unregister+0xd/0x30 > i2c_acpi_notify+0x10f/0x140 > notifier_call_chain+0x58/0xd0 > blocking_notifier_call_chain+0x3a/0x60 > acpi_device_del_work_fn+0x85/0x1d0 > process_one_work+0x134/0x2f0 > worker_thread+0x2f0/0x410 > ? __pfx_worker_thread+0x10/0x10 > kthread+0xe3/0x110 > ? __pfx_kthread+0x10/0x10 > ret_from_fork+0x2f/0x50 > ? __pfx_kthread+0x10/0x10 > ret_from_fork_asm+0x1b/0x30 > </TASK> > ---[ end trace 0000000000000000 ]--- > ... > repeated 7 more times, 1 for each channel of the mux > ... > > The issue is that the binding of the ACPI devices to their peer I2C > adapters is not correctly cleaned up. Digging deeper into the issue > we > see that the deletion order is such that the ACPI devices matching > the > mux channel i2c adapters are deleted first during the SSDT overlay > removal. For each of the channels we see a call to i2c_acpi_notify() > with ACPI_RECONFIG_DEVICE_REMOVE but, because these devices are not > actually i2c_clients, nothing is done for them. > > Later on, after each of the mux channels has been dealt with, we come > to delete the i2c_client representing the PCA9548 device. This is the > call stack we see above, whereby the kernel cleans up the i2c_client > including destruction of the mux and its channel adapters. At this > point we do attempt to unbind from the ACPI peers but those peers no > longer exist and so we hit the kernfs errors. > > The fix is to augment i2c_acpi_notify() to handle i2c_adapters. But, > given that the life cycle of the adapters is linked to the > i2c_client, > instead of deleting the i2c_adapters during the i2c_acpi_notify(), we > just trigger unbinding of the ACPI device from the adapter device, > and > allow the clean up of the adapter to continue in the way it always > has. > > Signed-off-by: Hamish Martin <hamish.martin@xxxxxxxxxxxxxxxxxxx> > Reviewed-by: Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx> > Reviewed-by: Andi Shyti <andi.shyti@xxxxxxxxxx> > Fixes: 525e6fabeae2 ("i2c / ACPI: add support for ACPI reconfigure > notifications") > Cc: <stable@xxxxxxxxxxxxxxx> # v4.8+ > --- > > Notes: > v4: > Resolve Build failure noted by: > Linux Kernel Functional Testing <lkft@xxxxxxxxxx>, and > kernel test robot <lkp@xxxxxxxxx> > These failures led to revert of the v3 version of this patch > that had been accepted earlier. > v3: > Add reviewed by tags (Mika Westerberg and Andi Shyti) and Fixes > tag. > v2: > Moved long problem description from cover letter to commit > description at Mika's suggestion > > drivers/i2c/i2c-core-acpi.c | 19 +++++++++++++++---- > 1 file changed, 15 insertions(+), 4 deletions(-) > > diff --git a/drivers/i2c/i2c-core-acpi.c b/drivers/i2c/i2c-core- > acpi.c > index d6037a328669..14ae0cfc325e 100644 > --- a/drivers/i2c/i2c-core-acpi.c > +++ b/drivers/i2c/i2c-core-acpi.c > @@ -445,6 +445,11 @@ static struct i2c_client > *i2c_acpi_find_client_by_adev(struct acpi_device *adev) > return i2c_find_device_by_fwnode(acpi_fwnode_handle(adev)); > } > > +static struct i2c_adapter *i2c_acpi_find_adapter_by_adev(struct > acpi_device *adev) > +{ > + return i2c_find_adapter_by_fwnode(acpi_fwnode_handle(adev)); > +} > + > static int i2c_acpi_notify(struct notifier_block *nb, unsigned long > value, > void *arg) > { > @@ -471,11 +476,17 @@ static int i2c_acpi_notify(struct > notifier_block *nb, unsigned long value, > break; > > client = i2c_acpi_find_client_by_adev(adev); > - if (!client) > - break; > + if (client) { > + i2c_unregister_device(client); > + put_device(&client->dev); > + } > + > + adapter = i2c_acpi_find_adapter_by_adev(adev); > + if (adapter) { > + acpi_unbind_one(&adapter->dev); > + put_device(&adapter->dev); > + } > > - i2c_unregister_device(client); > - put_device(&client->dev); > break; > } > I wonder if this patch has slipped through the net due to a process error on my part. The v3 version was accepted by Wolfram into for-current on March 4th, but a build issue was detected by a couple of test robots, leading to it being reverted 3 days later. I submitted this corrected v4 version on March 13th. Perhaps I should have not submitted a v4 and that has screwed something up, or perhaps folks are just busy ;-). Let me know what further work is required to get this one accepted. Thanks, Hamish M.