Re: [PATCH V6 01/20] platform/x86/intel/vsec: Fix xa_alloc memory leak

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 11/30/23 12:02, Ilpo Järvinen wrote:
> On Wed, 29 Nov 2023, David E. Box wrote:
> 
>> Commit 936874b77dd0 ("platform/x86/intel/vsec: Add PCI error recovery
>> support to Intel PMT") added an xarray to track the list of vsec devices to
>> be recovered after a PCI error. But it did not provide cleanup for the list
>> leading to a memory leak that was caught by kmemleak.  Do xa_alloc() before
>> devm_add_action_or_reset() so that the list may be cleaned up with
>> xa_erase() in the release function.
>>
>> Fixes: 936874b77dd0 ("platform/x86/intel/vsec: Add PCI error recovery support to Intel PMT")
>> Signed-off-by: David E. Box <david.e.box@xxxxxxxxxxxxxxx>
>> ---
>>
>> V6 - Move xa_alloc() before ida_alloc() to reduce mutex use during error
>>      recovery.
>>    - Fix return value after id_alloc() fail
>>    - Add Fixes tag
>>    - Add more detail to changelog
>>
>> V5 - New patch
>>
>>  drivers/platform/x86/intel/vsec.c | 24 ++++++++++++++----------
>>  drivers/platform/x86/intel/vsec.h |  1 +
>>  2 files changed, 15 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/platform/x86/intel/vsec.c b/drivers/platform/x86/intel/vsec.c
>> index c1f9e4471b28..2d568466b4e2 100644
>> --- a/drivers/platform/x86/intel/vsec.c
>> +++ b/drivers/platform/x86/intel/vsec.c
>> @@ -120,6 +120,8 @@ static void intel_vsec_dev_release(struct device *dev)
>>  {
>>  	struct intel_vsec_device *intel_vsec_dev = dev_to_ivdev(dev);
>>  
>> +	xa_erase(&auxdev_array, intel_vsec_dev->id);
>> +
>>  	mutex_lock(&vsec_ida_lock);
>>  	ida_free(intel_vsec_dev->ida, intel_vsec_dev->auxdev.id);
>>  	mutex_unlock(&vsec_ida_lock);
>> @@ -135,19 +137,27 @@ int intel_vsec_add_aux(struct pci_dev *pdev, struct device *parent,
>>  	struct auxiliary_device *auxdev = &intel_vsec_dev->auxdev;
>>  	int ret, id;
>>  
>> -	mutex_lock(&vsec_ida_lock);
>> -	ret = ida_alloc(intel_vsec_dev->ida, GFP_KERNEL);
>> -	mutex_unlock(&vsec_ida_lock);
>> +	ret = xa_alloc(&auxdev_array, &intel_vsec_dev->id, intel_vsec_dev,
>> +		       PMT_XA_LIMIT, GFP_KERNEL);
>>  	if (ret < 0) {
>>  		kfree(intel_vsec_dev->resource);
>>  		kfree(intel_vsec_dev);
>>  		return ret;
>>  	}
>>  
>> +	mutex_lock(&vsec_ida_lock);
>> +	id = ida_alloc(intel_vsec_dev->ida, GFP_KERNEL);
>> +	mutex_unlock(&vsec_ida_lock);
>> +	if (id < 0) {
>> +		kfree(intel_vsec_dev->resource);
>> +		kfree(intel_vsec_dev);
>> +		return id;
> 
> Thanks, this looks much better this way around but it is missing 
> xa_alloc() rollback now that the order is reversed.
> 
> Once that is fixed,
> 
> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx>

I have fixed this up, adding the missing:

	xa_erase(&auxdev_array, intel_vsec_dev->id);

to this error-exit path while merging this.

While looking into this I did find one other thing which
worries me (different issue, needs a separate fix):

intel_vsec_pci_slot_reset() uses

                devm_release_action(&pdev->dev, intel_vsec_remove_aux,
                                    &intel_vsec_dev->auxdev);

and seems to assume that after this intel_vsec_remove_aux()
has run for the auxdev-s. *But this is not the case*

devm_release_action() only removes the action from the list
of devres resources tied to the parent PCI device.

It does *NOT* call the registered action function,
so intel_vsec_remove_aux() is NOT called here.

And then on re-probing the device as is done in
intel_vsec_pci_slot_reset() a second set of aux
devices with the same parent will be created AFAICT.

So it seems that this also needs an explicit
intel_vsec_remove_aux() call for each auxdev!

###

This makes me wonder if the PCI error handling here
and specifically the reset code was ever tested ?

###

Note that simply forcing a reprobe using device_reprobe()
will cause all the aux-devices to also get removed through
the action on driver-unbind without ever needing
the auxdev_array at all!

I guess that you want the removal to happen before
the pci_restore_state(pdev) state though, so that
simply relying on the removal on driver unbind
is not an option ?

Regards,

Hans






[Index of Archives]     [Linux Kernel Development]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux