Summon Hans and Sakari for the ideas and for heads up (MIPI case can be affected as well). On Fri, Mar 15, 2024 at 12:39:03AM +0000, Hamish Martin wrote: > On Thu, 2024-03-14 at 15:18 +0200, andriy.shevchenko@xxxxxxxxxxxxxxx > wrote: > > On Thu, Mar 14, 2024 at 01:13:31AM +0000, Hamish Martin wrote: > > > On Wed, 2024-03-13 at 13:26 +0200, Andy Shevchenko wrote: ... > > > Removing the setting of the handle to invalid may be the right fix > > > but > > > I don't feel I know the code well enough to make a decision on > > > that. It > > > certainly doesn't resolve things immediately - I saw ref counting > > > warnings output. > > > > Not removing, but moving to the better place? > > Can you share warnings, though? > > > For context here is the current call chain that leads to > acpi_gpiochip_remove(): > > acpi_gpiochip_remove+0x32/0x1a0 > gpiochip_remove+0x39/0x140 > devres_release_group+0xe6/0x160 > i2c_device_remove+0x2d/0x80 > device_release_driver_internal+0x19a/0x200 > bus_remove_device+0xbf/0x100 > device_del+0x157/0x490 > ? __pfx_device_match_fwnode+0x10/0x10 > ? srso_return_thunk+0x5/0x5f > device_unregister+0xd/0x30 > i2c_acpi_notify+0x10e/0x160 > notifier_call_chain+0x58/0xd0 > blocking_notifier_call_chain+0x3a/0x60 > acpi_device_del_work_fn+0x85/0x1d0 > process_one_work+0x134/0x2f0 > worker_thread+0x2f0/0x410 > ? __pfx_worker_thread+0x10/0x10 > kthread+0xe3/0x110 > ? __pfx_kthread+0x10/0x10 > ret_from_fork+0x2f/0x50 > ? __pfx_kthread+0x10/0x10 > ret_from_fork_asm+0x1b/0x30 Right, and with invalid handle it can't process further. And note, that _pointer_ is valid, the content become unavailable. > I removed the setting of adev->handle = INVALID_ACPI_HANDLE from > acpi_scan_drop_device() and shifted it to just after the call to > blocking_notifier_call_chain() in acpi_device_del_work_fn(). > With that it seems things progress further with the call to > acpi_get_data() in acpi_gpiochip_remove() succeeding now. However, > later in acpi_gpiochip_free_regions() we hit this error: > > pca953x i2c-PRP0001:03: Failed to remove GPIO OpRegion handler > > We also get these errors: > ACPI Warning: Obj 00000000ba6a9600, Reference Count is already zero, > cannot decrement > (20230628/utdelete-422) Right, because of ACPICA (not even ACPI glue layer!) the callbacks are called when namespace node (which is acpi_handle) is being removed. I spend a few hours to understand the history of the invalidation of the handle. TBH, it looks like a hack to me, but its presence seems necessary to avoid racing with the hotplug work. It's a lot of functions that run asynchronously there and the validness of some objects is questionable. Here are the commits in question: d783156ea384 ("ACPI / scan: Define non-empty device removal handler") c27b2c33b621 ("ACPI / hotplug: Introduce common hotplug function acpi_device_hotplug()") (It doesn't mean they are bad, it means that this requires more investigation.) That said, from my perspective what should be done/clarified - the scope of ACPI data (Can we use it outside of ACPICA/ACPI glue layer or not?) - clarify in the documentation the scope / life time of the ACPI objects from the ACPICA perspective (Is it already done? Where?) - remove that invalidation hack and find better solution for the race avoidance > > P.S. > > I'm not an expert in ACPICA and low lever of ACPI glue layer in the Linux > > kernel, perhaps Rafael can suggest something better. > > > OK, thanks for your help. Hopefully Rafael can add something to the > discssion. Added more people to harvest the ideas. -- With Best Regards, Andy Shevchenko