On Wed, 2024-03-13 at 13:26 +0200, Andy Shevchenko wrote: > On Wed, Mar 13, 2024 at 01:21:25PM +0200, Andy Shevchenko wrote: > > On Wed, Mar 13, 2024 at 04:02:51PM +1300, Hamish Martin wrote: > > > A memory leak occurs in a scenario where an ACPI SSDT overlay is > > > removed > > > and that is the trigger for the removal of the acpi_gpio_chip. > > > This occurs because we use the ACPI_HANDLE of the chip->parent as > > > a > > > convenient location to tie the 'struct acpi_gpio_chip' to, using > > > acpi_attach_data(). > > > This is fine in the usual case of removal of the 'struct > > > acpi_gpio_chip' > > > via a call to acpi_gpiochip_remove() because usually the ACPI > > > data is > > > still valid. > > > But in the case of an SSDT overlay removal, the ACPI data has > > > been > > > marked as invalid before the removal code is triggered (see the > > > setting > > > of the acpi_device handle to INVALID_ACPI_HANDLE in > > > acpi_scan_drop_device()). This means that by the time we execute > > > acpi_gpiochip_remove(), the handles are invalid and we are unable > > > to > > > retrieve the 'struct acpi_gpio_chip' using acpi_get_data(). > > > Unable to > > > get our data, we hit the failure case and see the following > > > warning > > > logged: > > > Failed to retrieve ACPI GPIO chip > > > This means we also fail to kfree() the struct at the end of > > > acpi_gpiochip_remove(). > > > > > > The fix is to no longer tie the acpi_gpio_chip data to the ACPI > > > data, > > > but instead hang it directly from the 'struct gpio_chip' with a > > > new > > > field. This decouples the lifecycle of the acpi_gpio_chip from > > > the ACPI > > > data, and ties it to the gpio_chip. This seems a much better > > > place since > > > they share a common lifecycle. > > > > Maybe in this case it's indeed better. > > > > > The general concept of attaching data to the ACPI objects may > > > also need > > > revisiting in light of the issue this patch addresses. For > > > instance > > > i2c_acpi_remove_space_handler() is vulnerable to a similar leak > > > due to > > > using acpi_bus_get_private_data(), which just wraps > > > acpi_attach_data(). > > > This may need a more widespread change than is addressed in this > > > patch. > > > > But with this it sounds to me that the root cause is like you said > > in ACPI > > removal device code, i.e. acpi_scan_drop_device. Shouldn't that be > > fixed first > > to be more clever? Potentially, although I think locking becomes an issue. For instance I tried to use the acpi_gpio_chip_dh() function for its correct purpose as it is invoked in acpi_ns_delete_node() which is to allow the attached data to be cleaned up. --------->8----------- /* Invoke the attached data deletion handler if present */ if (obj_desc->data.handler) { obj_desc->data.handler(node, obj_desc- >data.pointer); } --------->8----------- So I tried to do the work done in acpi_gpiochip_remove() at that time but hit locking issues downstream of acpi_gpiochip_free_regions(). I could just kfree the acpi_gpio_chip in acpi_gpio_chip_dh(), and alter acpi_gpiochip_remove() to just silently ignore the failure of acpi_get_data() for my use case, but that seems a little untidy since it would leave two different places where the acpi_gpio_chip is removed. That is safe to do since all the stuff actually removed under acpi_gpiochip_free_regions() gets cleaned up by the deletion of the namespace node - but again, it seems untidy to have two different deletion paths depending on how the removal is triggered. Removing the setting of the handle to invalid may be the right fix but I don't feel I know the code well enough to make a decision on that. It certainly doesn't resolve things immediately - I saw ref counting warnings output. > > Or are you stating that basically acpi_bus_get_private_data() and > > concept of > > the private data is weak to the point that mostly become useless? I think as it is written, any code in driver remove() paths that tries to get any attached data cannot reliable achieve that in SSDT overlay removal scenarios. I think this is mostly failing silently with small leaks. Also, perhaps removing these SSDT overlays is not a common use case. > > Another Q is how does the OF case survive similar flow (DT overlay > removal)? > For DT overlays, in the gpio_chip case there is no data created and attached to the of_node. It simply takes a ref (of_node_get(chip- >of_node) in of_gpiochip_add() and release it on removal (of_node_put(chip->of_node) in of_gpiochip_remove()). In a more general sense, I haven't seen any attaching of data to the of_node objects in any of the drivers I'm familiar with or have seen. In my experience the device tree is used at probe() time, nothing is added to it by the drivers, and it is not used at remove() time. Thanks for your help with this stuff. Any further ideas or requests for info would be appreciated. Thanks, Hamish M