Hi, On 11/30/23 12:02, Ilpo Järvinen wrote: > On Wed, 29 Nov 2023, David E. Box wrote: > >> Commit 936874b77dd0 ("platform/x86/intel/vsec: Add PCI error recovery >> support to Intel PMT") added an xarray to track the list of vsec devices to >> be recovered after a PCI error. But it did not provide cleanup for the list >> leading to a memory leak that was caught by kmemleak. Do xa_alloc() before >> devm_add_action_or_reset() so that the list may be cleaned up with >> xa_erase() in the release function. >> >> Fixes: 936874b77dd0 ("platform/x86/intel/vsec: Add PCI error recovery support to Intel PMT") >> Signed-off-by: David E. Box <david.e.box@xxxxxxxxxxxxxxx> >> --- >> >> V6 - Move xa_alloc() before ida_alloc() to reduce mutex use during error >> recovery. >> - Fix return value after id_alloc() fail >> - Add Fixes tag >> - Add more detail to changelog >> >> V5 - New patch >> >> drivers/platform/x86/intel/vsec.c | 24 ++++++++++++++---------- >> drivers/platform/x86/intel/vsec.h | 1 + >> 2 files changed, 15 insertions(+), 10 deletions(-) >> >> diff --git a/drivers/platform/x86/intel/vsec.c b/drivers/platform/x86/intel/vsec.c >> index c1f9e4471b28..2d568466b4e2 100644 >> --- a/drivers/platform/x86/intel/vsec.c >> +++ b/drivers/platform/x86/intel/vsec.c >> @@ -120,6 +120,8 @@ static void intel_vsec_dev_release(struct device *dev) >> { >> struct intel_vsec_device *intel_vsec_dev = dev_to_ivdev(dev); >> >> + xa_erase(&auxdev_array, intel_vsec_dev->id); >> + >> mutex_lock(&vsec_ida_lock); >> ida_free(intel_vsec_dev->ida, intel_vsec_dev->auxdev.id); >> mutex_unlock(&vsec_ida_lock); >> @@ -135,19 +137,27 @@ int intel_vsec_add_aux(struct pci_dev *pdev, struct device *parent, >> struct auxiliary_device *auxdev = &intel_vsec_dev->auxdev; >> int ret, id; >> >> - mutex_lock(&vsec_ida_lock); >> - ret = ida_alloc(intel_vsec_dev->ida, GFP_KERNEL); >> - mutex_unlock(&vsec_ida_lock); >> + ret = xa_alloc(&auxdev_array, &intel_vsec_dev->id, intel_vsec_dev, >> + PMT_XA_LIMIT, GFP_KERNEL); >> if (ret < 0) { >> kfree(intel_vsec_dev->resource); >> kfree(intel_vsec_dev); >> return ret; >> } >> >> + mutex_lock(&vsec_ida_lock); >> + id = ida_alloc(intel_vsec_dev->ida, GFP_KERNEL); >> + mutex_unlock(&vsec_ida_lock); >> + if (id < 0) { >> + kfree(intel_vsec_dev->resource); >> + kfree(intel_vsec_dev); >> + return id; > > Thanks, this looks much better this way around but it is missing > xa_alloc() rollback now that the order is reversed. > > Once that is fixed, > > Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx> I have fixed this up, adding the missing: xa_erase(&auxdev_array, intel_vsec_dev->id); to this error-exit path while merging this. While looking into this I did find one other thing which worries me (different issue, needs a separate fix): intel_vsec_pci_slot_reset() uses devm_release_action(&pdev->dev, intel_vsec_remove_aux, &intel_vsec_dev->auxdev); and seems to assume that after this intel_vsec_remove_aux() has run for the auxdev-s. *But this is not the case* devm_release_action() only removes the action from the list of devres resources tied to the parent PCI device. It does *NOT* call the registered action function, so intel_vsec_remove_aux() is NOT called here. And then on re-probing the device as is done in intel_vsec_pci_slot_reset() a second set of aux devices with the same parent will be created AFAICT. So it seems that this also needs an explicit intel_vsec_remove_aux() call for each auxdev! ### This makes me wonder if the PCI error handling here and specifically the reset code was ever tested ? ### Note that simply forcing a reprobe using device_reprobe() will cause all the aux-devices to also get removed through the action on driver-unbind without ever needing the auxdev_array at all! I guess that you want the removal to happen before the pci_restore_state(pdev) state though, so that simply relying on the removal on driver unbind is not an option ? Regards, Hans