Hi Hans, On 11/17/2023 5:01 PM, Hans de Goede wrote: > Hi, > > On 11/17/23 12:08, Shyam Sundar S K wrote: >> Adding AMDGPU folks (Alex and Christian) - I had to drop them as the >> changes from amdgpu were moved to PMF driver. >> >> Hi Hans, >> >> >> On 11/17/2023 4:18 PM, Hans de Goede wrote: >>> Hi Shyam, >>> >>> On 11/17/23 09:07, Shyam Sundar S K wrote: >>>> In order to provide GPU inputs to TA for the Smart PC solution to work, we >>>> need to have interface between the PMF driver and the AMDGPU driver. >>>> >>>> Add the initial code path for get interface from AMDGPU. >>>> >>>> Co-developed-by: Mario Limonciello <mario.limonciello@xxxxxxx> >>>> Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx> >>>> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@xxxxxxx> > > <snip> > >>>> diff --git a/drivers/platform/x86/amd/pmf/tee-if.c b/drivers/platform/x86/amd/pmf/tee-if.c >>>> index 81b1bd356e83..82ee2b1c627f 100644 >>>> --- a/drivers/platform/x86/amd/pmf/tee-if.c >>>> +++ b/drivers/platform/x86/amd/pmf/tee-if.c >>>> @@ -10,6 +10,7 @@ >>>> >>>> #include <linux/debugfs.h> >>>> #include <linux/tee_drv.h> >>>> +#include <linux/thermal.h> >>>> #include <linux/uuid.h> >>>> #include "pmf.h" >>>> >>>> @@ -422,6 +423,60 @@ static void amd_pmf_tee_deinit(struct amd_pmf_dev *dev) >>>> tee_client_close_context(dev->tee_ctx); >>>> } >>>> >>>> +static int amd_pmf_gpu_get_cur_state(struct thermal_cooling_device *cooling_dev, >>>> + unsigned long *state) >>>> +{ >>>> + struct backlight_device *bd; >>>> + >>>> + if (acpi_video_get_backlight_type() != acpi_backlight_native) >>>> + return -ENODEV; >>>> + >>>> + bd = backlight_device_get_by_type(BACKLIGHT_RAW); >>>> + if (!bd) >>>> + return -ENODEV; >>>> + >>>> + *state = backlight_get_brightness(bd); >>>> + >>>> + return 0; >>>> +} >>>> + >>>> +static int amd_pmf_gpu_get_max_state(struct thermal_cooling_device *cooling_dev, >>>> + unsigned long *state) >>>> +{ >>>> + struct backlight_device *bd; >>>> + >>>> + if (acpi_video_get_backlight_type() != acpi_backlight_native) >>>> + return -ENODEV; >>>> + >>>> + bd = backlight_device_get_by_type(BACKLIGHT_RAW); >>>> + if (!bd) >>>> + return -ENODEV; >>>> + >>>> + if (backlight_is_blank(bd)) >>>> + *state = 0; >>>> + else >>>> + *state = bd->props.max_brightness; >>>> + >>>> + return 0; >>>> +} >>>> + >>>> +static const struct thermal_cooling_device_ops bd_cooling_ops = { >>>> + .get_max_state = amd_pmf_gpu_get_max_state, >>>> + .get_cur_state = amd_pmf_gpu_get_cur_state, >>>> +}; >>> >>> This is still the wrong thing to do. The new PMF code MUST only >>> be a *consumer* of the thermal_cooling_device API. >>> >>> The thermal-cooling device for the backlight itself MUST be >>> registered by the amdgpu driver. >>> >>> I believe that the correct fix for this is to add a new flag to >>> the backlight_properties struct; >>> And then make backlight_device_register() register >>> a thermal_cooling_device for the backlight when this new flag is set. >>> >>> This way we get a nice generic way to use backlight class devices >>> as thermal cooling devices and you can make the amdgpu driver >>> register the thermal cooling device by simply adding the new flag >>> to the backlight_properties struct used in the amdgpu driver. >> >> Agreed. I am also of the same opinion. > > So the change to the AMDGPU driver here would just be setting > this one new flag in the backlight_properties struct. > > Alex, Christian, would that be acceptable to you ? > > >>>> +static int amd_pmf_get_gpu_handle(struct pci_dev *pdev, void *data) >>>> +{ >>>> + struct amd_pmf_dev *dev = data; >>>> + >>>> + if (pdev->vendor == PCI_VENDOR_ID_ATI && pdev->devfn == 0) { >>>> + dev->gfx_data.gpu_dev = pdev; >>>> + return 1; /* Stop walking */ >>>> + } >>>> + >>>> + return 0; /* Continue walking */ >>>> +} >>>> + >>>> int amd_pmf_init_smart_pc(struct amd_pmf_dev *dev) >>>> { >>>> int ret; >>>> @@ -433,10 +488,30 @@ int amd_pmf_init_smart_pc(struct amd_pmf_dev *dev) >>>> INIT_DELAYED_WORK(&dev->pb_work, amd_pmf_invoke_cmd); >>>> amd_pmf_set_dram_addr(dev); >>>> amd_pmf_get_bios_buffer(dev); >>>> + >>>> dev->prev_data = kzalloc(sizeof(*dev->prev_data), GFP_KERNEL); >>>> if (!dev->prev_data) >>>> return -ENOMEM; >>>> >>>> + pci_walk_bus(dev->root->bus, amd_pmf_get_gpu_handle, dev); >>>> + if (dev->gfx_data.gpu_dev) { >>>> + dev->drm_dev = pci_get_drvdata(dev->gfx_data.gpu_dev); >>>> + if (!dev->drm_dev) >>>> + return -EINVAL; >>> >>> You cannot just call pci_get_drvdata() on a device for which you >>> are not the driver. You have no idea of the lifetime of this device, >>> the driver may unbind and release the memory pci_get_drvdata() >>> points to, leaving this code with a dangling pointer which will >>> crash the kernel the first time you try to use it. >>> >>> Also since you are not the owner you MUST not assume any specific >>> type for this memory, you cannot be sure this actually is of >>> the type drm_device. Basically you MUST not touch another >>> driver's drvdata *at all*. >>> >>> The proper way to fix this would be to add some API to the DRM >>> subsystem to query the information you are looking for form >>> the DRM subsystem. >>> >>> Poking directly inside other driver's internals is NOT acceptable, >>> NACK for this patch. >>> >> >> I am inline with your remarks, but I could find a way where the >> thermal driver registration, handling the backlight without having the >> changes in the amdgpu driver very tricky. > > As mentioned above I think there should be generic thermal cooling > device support added to drivers/video/backlight/backlight.c, then > the amdgpu code just needs to pass a flag when registering > the backlight to enable this. > >> Like you said, I am also of the same opinion that PMF driver should >> call the DRM/GPU subsystem APIs (like it does with other subsystems). >> >> But Christian had concerns on adding all of these into the GPU driver. >> So I had to roll back these into the PMF driver, and hence you see a >> pci_get_drvdata() call. > > Ok, so I can see how this AMD PMF code is all kinda special > and how the DRM folks don't want to have to add APIs just for > that. But IMHO calling pci_get_drvdata() on an unowned > device is completely unacceptable. > > At a minimum we need life-cycle management for the drm_device > which the PMF code is using, something like: > > struct drm_device *drm_device_find(const void *data, > int (*match)(struct drm_device *dev, const void *data)); > > which works similar to class_find_device() and returns > a reference to the drm_device for which match has returned 0 > (which also stops it from looping over devices). > > Combined with a: > > void drm_device_put(struct drm_device *dev); > > for when the PMF code is done with the device. > > This way the PMF code can safely get a reference to drm_device > and release it when it is done. Rather then just getting > some random pointer which may or not actually be a drm_device > and the lifetime of which is not guaranteed in anyway. > > E.g. if the PMF driver loads before amdgpu then > pci_get_drvdata() will just return NULL. > > And as mentioned if the amdgpu driver gets unbound after > the PMF code has called pci_get_drvdata() the PMF driver > now has a dangling pointer. > > So IMHO adding: drm_device_find() + drm_device_put() > to the DRM core are minimum which is necessary here. > > If the PMF code then ends up doing things like > drm_for_each_connector_iter() on the gotten drm_device > referemce so be it. But we must make sure we have > a properly lifecycle managed reference first and > pci_get_drvdata() does not give us that. > Ok I will work on your feedback. >> For the sake of simplicity, I can drop patches 13/17 and 14/17 from >> the series and send them separately later. > > Yes I think that dropping the GPU related patches for > now would be best. > Thank you. Let me wait for feedback from others before I drop the GPU patches. Thanks, Shyam > Regards, > > Hans > > > >