Re: [PATCH v5 13/17] platform/x86/amd/pmf: Add PMF-AMDGPU get interface

Shyam Sundar S K <Shyam-sundar.S-k@xxxxxxx> · Mon, 20 Nov 2023 12:12:41 +0530

Hi Hans,

On 11/17/2023 5:01 PM, Hans de Goede wrote:
> Hi,
> 
> On 11/17/23 12:08, Shyam Sundar S K wrote:
>> Adding AMDGPU folks (Alex and Christian) - I had to drop them as the
>> changes from amdgpu were moved to PMF driver.
>>
>> Hi Hans,
>>
>>
>> On 11/17/2023 4:18 PM, Hans de Goede wrote:
>>> Hi Shyam,
>>>
>>> On 11/17/23 09:07, Shyam Sundar S K wrote:
>>>> In order to provide GPU inputs to TA for the Smart PC solution to work, we
>>>> need to have interface between the PMF driver and the AMDGPU driver.
>>>>
>>>> Add the initial code path for get interface from AMDGPU.
>>>>
>>>> Co-developed-by: Mario Limonciello <mario.limonciello@xxxxxxx>
>>>> Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx>
>>>> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@xxxxxxx>
> 
> <snip>
> 
>>>> diff --git a/drivers/platform/x86/amd/pmf/tee-if.c b/drivers/platform/x86/amd/pmf/tee-if.c
>>>> index 81b1bd356e83..82ee2b1c627f 100644
>>>> --- a/drivers/platform/x86/amd/pmf/tee-if.c
>>>> +++ b/drivers/platform/x86/amd/pmf/tee-if.c
>>>> @@ -10,6 +10,7 @@
>>>>  
>>>>  #include <linux/debugfs.h>
>>>>  #include <linux/tee_drv.h>
>>>> +#include <linux/thermal.h>
>>>>  #include <linux/uuid.h>
>>>>  #include "pmf.h"
>>>>  
>>>> @@ -422,6 +423,60 @@ static void amd_pmf_tee_deinit(struct amd_pmf_dev *dev)
>>>>  	tee_client_close_context(dev->tee_ctx);
>>>>  }
>>>>  
>>>> +static int amd_pmf_gpu_get_cur_state(struct thermal_cooling_device *cooling_dev,
>>>> +				     unsigned long *state)
>>>> +{
>>>> +	struct backlight_device *bd;
>>>> +
>>>> +	if (acpi_video_get_backlight_type() != acpi_backlight_native)
>>>> +		return -ENODEV;
>>>> +
>>>> +	bd = backlight_device_get_by_type(BACKLIGHT_RAW);
>>>> +	if (!bd)
>>>> +		return -ENODEV;
>>>> +
>>>> +	*state = backlight_get_brightness(bd);
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static int amd_pmf_gpu_get_max_state(struct thermal_cooling_device *cooling_dev,
>>>> +				     unsigned long *state)
>>>> +{
>>>> +	struct backlight_device *bd;
>>>> +
>>>> +	if (acpi_video_get_backlight_type() != acpi_backlight_native)
>>>> +		return -ENODEV;
>>>> +
>>>> +	bd = backlight_device_get_by_type(BACKLIGHT_RAW);
>>>> +	if (!bd)
>>>> +		return -ENODEV;
>>>> +
>>>> +	if (backlight_is_blank(bd))
>>>> +		*state = 0;
>>>> +	else
>>>> +		*state = bd->props.max_brightness;
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static const struct thermal_cooling_device_ops bd_cooling_ops = {
>>>> +	.get_max_state = amd_pmf_gpu_get_max_state,
>>>> +	.get_cur_state = amd_pmf_gpu_get_cur_state,
>>>> +};
>>>
>>> This is still the wrong thing to do. The new PMF code MUST only
>>> be a *consumer* of the thermal_cooling_device API.
>>>
>>> The thermal-cooling device for the backlight itself MUST be
>>> registered by the amdgpu driver.
>>>
>>> I believe that the correct fix for this is to add a new flag to
>>> the backlight_properties struct;
>>> And then make backlight_device_register() register
>>> a thermal_cooling_device for the backlight when this new flag is set.
>>>
>>> This way we get a nice generic way to use backlight class devices
>>> as thermal cooling devices and you can make the amdgpu driver
>>> register the thermal cooling device by simply adding the new flag
>>> to the backlight_properties struct used in the amdgpu driver.
>>
>> Agreed. I am also of the same opinion.
> 
> So the change to the AMDGPU driver here would just be setting
> this one new flag in the backlight_properties struct.
> 
> Alex, Christian, would that be acceptable to you ?
> 
> 
>>>> +static int amd_pmf_get_gpu_handle(struct pci_dev *pdev, void *data)
>>>> +{
>>>> +	struct amd_pmf_dev *dev = data;
>>>> +
>>>> +	if (pdev->vendor == PCI_VENDOR_ID_ATI && pdev->devfn == 0) {
>>>> +		dev->gfx_data.gpu_dev = pdev;
>>>> +		return 1; /* Stop walking */
>>>> +	}
>>>> +
>>>> +	return 0; /* Continue walking */
>>>> +}
>>>> +
>>>>  int amd_pmf_init_smart_pc(struct amd_pmf_dev *dev)
>>>>  {
>>>>  	int ret;
>>>> @@ -433,10 +488,30 @@ int amd_pmf_init_smart_pc(struct amd_pmf_dev *dev)
>>>>  	INIT_DELAYED_WORK(&dev->pb_work, amd_pmf_invoke_cmd);
>>>>  	amd_pmf_set_dram_addr(dev);
>>>>  	amd_pmf_get_bios_buffer(dev);
>>>> +
>>>>  	dev->prev_data = kzalloc(sizeof(*dev->prev_data), GFP_KERNEL);
>>>>  	if (!dev->prev_data)
>>>>  		return -ENOMEM;
>>>>  
>>>> +	pci_walk_bus(dev->root->bus, amd_pmf_get_gpu_handle, dev);
>>>> +	if (dev->gfx_data.gpu_dev) {
>>>> +		dev->drm_dev = pci_get_drvdata(dev->gfx_data.gpu_dev);
>>>> +		if (!dev->drm_dev)
>>>> +			return -EINVAL;
>>>
>>> You cannot just call pci_get_drvdata() on a device for which you
>>> are not the driver. You have no idea of the lifetime of this device,
>>> the driver may unbind and release the memory pci_get_drvdata()
>>> points to, leaving this code with a dangling pointer which will
>>> crash the kernel the first time you try to use it.
>>>
>>> Also since you are not the owner you MUST not assume any specific
>>> type for this memory, you cannot be sure this actually is of
>>> the type drm_device. Basically you MUST not touch another
>>> driver's drvdata *at all*.
>>>
>>> The proper way to fix this would be to add some API to the DRM
>>> subsystem to query the information you are looking for form
>>> the DRM subsystem.
>>>
>>> Poking directly inside other driver's internals is NOT acceptable,
>>> NACK for this patch.
>>>
>>
>> I am inline with your remarks, but I could find a way where the
>> thermal driver registration, handling the backlight without having the
>> changes in the amdgpu driver very tricky.
> 
> As mentioned above I think there should be generic thermal cooling
> device support added to drivers/video/backlight/backlight.c, then
> the amdgpu code just needs to pass a flag when registering
> the backlight to enable this.
> 
>> Like you said, I am also of the same opinion that PMF driver should
>> call the DRM/GPU subsystem APIs (like it does with other subsystems).
>>
>> But Christian had concerns on adding all of these into the GPU driver.
>> So I had to roll back these into the PMF driver, and hence you see a
>> pci_get_drvdata() call.
> 
> Ok, so I can see how this AMD PMF code is all kinda special
> and how the DRM folks don't want to have to add APIs just for
> that. But IMHO calling pci_get_drvdata() on an unowned
> device is completely unacceptable.
> 
> At a minimum we need life-cycle management for the drm_device
> which the PMF code is using, something like:
> 
> struct drm_device *drm_device_find(const void *data,
>    int (*match)(struct drm_device *dev, const void *data));
> 
> which works similar to class_find_device() and returns
> a reference to the drm_device for which match has returned 0
> (which also stops it from looping over devices).
> 
> Combined with a:
> 
> void drm_device_put(struct drm_device *dev);
> 
> for when the PMF code is done with the device.
> 
> This way the PMF code can safely get a reference to drm_device
> and release it when it is done. Rather then just getting
> some random pointer which may or not actually be a drm_device
> and the lifetime of which is not guaranteed in anyway.
> 
> E.g. if the PMF driver loads before amdgpu then
> pci_get_drvdata() will just return NULL.
> 
> And as mentioned if the amdgpu driver gets unbound after
> the PMF code has called  pci_get_drvdata() the PMF driver
> now has a dangling pointer.
> 
> So IMHO adding: drm_device_find() + drm_device_put()
> to the DRM core are minimum which is necessary here.
> 
> If the PMF code then ends up doing things like
> drm_for_each_connector_iter() on the gotten drm_device
> referemce so be it. But we must make sure we have
> a properly lifecycle managed reference first and
> pci_get_drvdata() does not give us that.
> 

Ok I will work on your feedback.

>> For the sake of simplicity, I can drop patches 13/17 and 14/17 from
>> the series and send them separately later.
> 
> Yes I think that dropping the GPU related patches for
> now would be best.
> 

Thank you. Let me wait for feedback from others before I drop the GPU
patches.

Thanks,
Shyam

> Regards,
> 
> Hans
> 
> 
> 
>