On Wed, Mar 13, 2024 at 01:21:25PM +0200, Andy Shevchenko wrote: > On Wed, Mar 13, 2024 at 04:02:51PM +1300, Hamish Martin wrote: > > A memory leak occurs in a scenario where an ACPI SSDT overlay is removed > > and that is the trigger for the removal of the acpi_gpio_chip. > > This occurs because we use the ACPI_HANDLE of the chip->parent as a > > convenient location to tie the 'struct acpi_gpio_chip' to, using > > acpi_attach_data(). > > This is fine in the usual case of removal of the 'struct acpi_gpio_chip' > > via a call to acpi_gpiochip_remove() because usually the ACPI data is > > still valid. > > But in the case of an SSDT overlay removal, the ACPI data has been > > marked as invalid before the removal code is triggered (see the setting > > of the acpi_device handle to INVALID_ACPI_HANDLE in > > acpi_scan_drop_device()). This means that by the time we execute > > acpi_gpiochip_remove(), the handles are invalid and we are unable to > > retrieve the 'struct acpi_gpio_chip' using acpi_get_data(). Unable to > > get our data, we hit the failure case and see the following warning > > logged: > > Failed to retrieve ACPI GPIO chip > > This means we also fail to kfree() the struct at the end of > > acpi_gpiochip_remove(). > > > > The fix is to no longer tie the acpi_gpio_chip data to the ACPI data, > > but instead hang it directly from the 'struct gpio_chip' with a new > > field. This decouples the lifecycle of the acpi_gpio_chip from the ACPI > > data, and ties it to the gpio_chip. This seems a much better place since > > they share a common lifecycle. > > Maybe in this case it's indeed better. > > > The general concept of attaching data to the ACPI objects may also need > > revisiting in light of the issue this patch addresses. For instance > > i2c_acpi_remove_space_handler() is vulnerable to a similar leak due to > > using acpi_bus_get_private_data(), which just wraps acpi_attach_data(). > > This may need a more widespread change than is addressed in this patch. > > But with this it sounds to me that the root cause is like you said in ACPI > removal device code, i.e. acpi_scan_drop_device. Shouldn't that be fixed first > to be more clever? > Or are you stating that basically acpi_bus_get_private_data() and concept of > the private data is weak to the point that mostly become useless? Another Q is how does the OF case survive similar flow (DT overlay removal)? -- With Best Regards, Andy Shevchenko