Hi, On 11/17/23 12:08, Shyam Sundar S K wrote: > Adding AMDGPU folks (Alex and Christian) - I had to drop them as the > changes from amdgpu were moved to PMF driver. > > Hi Hans, > > > On 11/17/2023 4:18 PM, Hans de Goede wrote: >> Hi Shyam, >> >> On 11/17/23 09:07, Shyam Sundar S K wrote: >>> In order to provide GPU inputs to TA for the Smart PC solution to work, we >>> need to have interface between the PMF driver and the AMDGPU driver. >>> >>> Add the initial code path for get interface from AMDGPU. >>> >>> Co-developed-by: Mario Limonciello <mario.limonciello@xxxxxxx> >>> Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx> >>> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@xxxxxxx> <snip> >>> diff --git a/drivers/platform/x86/amd/pmf/tee-if.c b/drivers/platform/x86/amd/pmf/tee-if.c >>> index 81b1bd356e83..82ee2b1c627f 100644 >>> --- a/drivers/platform/x86/amd/pmf/tee-if.c >>> +++ b/drivers/platform/x86/amd/pmf/tee-if.c >>> @@ -10,6 +10,7 @@ >>> >>> #include <linux/debugfs.h> >>> #include <linux/tee_drv.h> >>> +#include <linux/thermal.h> >>> #include <linux/uuid.h> >>> #include "pmf.h" >>> >>> @@ -422,6 +423,60 @@ static void amd_pmf_tee_deinit(struct amd_pmf_dev *dev) >>> tee_client_close_context(dev->tee_ctx); >>> } >>> >>> +static int amd_pmf_gpu_get_cur_state(struct thermal_cooling_device *cooling_dev, >>> + unsigned long *state) >>> +{ >>> + struct backlight_device *bd; >>> + >>> + if (acpi_video_get_backlight_type() != acpi_backlight_native) >>> + return -ENODEV; >>> + >>> + bd = backlight_device_get_by_type(BACKLIGHT_RAW); >>> + if (!bd) >>> + return -ENODEV; >>> + >>> + *state = backlight_get_brightness(bd); >>> + >>> + return 0; >>> +} >>> + >>> +static int amd_pmf_gpu_get_max_state(struct thermal_cooling_device *cooling_dev, >>> + unsigned long *state) >>> +{ >>> + struct backlight_device *bd; >>> + >>> + if (acpi_video_get_backlight_type() != acpi_backlight_native) >>> + return -ENODEV; >>> + >>> + bd = backlight_device_get_by_type(BACKLIGHT_RAW); >>> + if (!bd) >>> + return -ENODEV; >>> + >>> + if (backlight_is_blank(bd)) >>> + *state = 0; >>> + else >>> + *state = bd->props.max_brightness; >>> + >>> + return 0; >>> +} >>> + >>> +static const struct thermal_cooling_device_ops bd_cooling_ops = { >>> + .get_max_state = amd_pmf_gpu_get_max_state, >>> + .get_cur_state = amd_pmf_gpu_get_cur_state, >>> +}; >> >> This is still the wrong thing to do. The new PMF code MUST only >> be a *consumer* of the thermal_cooling_device API. >> >> The thermal-cooling device for the backlight itself MUST be >> registered by the amdgpu driver. >> >> I believe that the correct fix for this is to add a new flag to >> the backlight_properties struct; >> And then make backlight_device_register() register >> a thermal_cooling_device for the backlight when this new flag is set. >> >> This way we get a nice generic way to use backlight class devices >> as thermal cooling devices and you can make the amdgpu driver >> register the thermal cooling device by simply adding the new flag >> to the backlight_properties struct used in the amdgpu driver. > > Agreed. I am also of the same opinion. So the change to the AMDGPU driver here would just be setting this one new flag in the backlight_properties struct. Alex, Christian, would that be acceptable to you ? >>> +static int amd_pmf_get_gpu_handle(struct pci_dev *pdev, void *data) >>> +{ >>> + struct amd_pmf_dev *dev = data; >>> + >>> + if (pdev->vendor == PCI_VENDOR_ID_ATI && pdev->devfn == 0) { >>> + dev->gfx_data.gpu_dev = pdev; >>> + return 1; /* Stop walking */ >>> + } >>> + >>> + return 0; /* Continue walking */ >>> +} >>> + >>> int amd_pmf_init_smart_pc(struct amd_pmf_dev *dev) >>> { >>> int ret; >>> @@ -433,10 +488,30 @@ int amd_pmf_init_smart_pc(struct amd_pmf_dev *dev) >>> INIT_DELAYED_WORK(&dev->pb_work, amd_pmf_invoke_cmd); >>> amd_pmf_set_dram_addr(dev); >>> amd_pmf_get_bios_buffer(dev); >>> + >>> dev->prev_data = kzalloc(sizeof(*dev->prev_data), GFP_KERNEL); >>> if (!dev->prev_data) >>> return -ENOMEM; >>> >>> + pci_walk_bus(dev->root->bus, amd_pmf_get_gpu_handle, dev); >>> + if (dev->gfx_data.gpu_dev) { >>> + dev->drm_dev = pci_get_drvdata(dev->gfx_data.gpu_dev); >>> + if (!dev->drm_dev) >>> + return -EINVAL; >> >> You cannot just call pci_get_drvdata() on a device for which you >> are not the driver. You have no idea of the lifetime of this device, >> the driver may unbind and release the memory pci_get_drvdata() >> points to, leaving this code with a dangling pointer which will >> crash the kernel the first time you try to use it. >> >> Also since you are not the owner you MUST not assume any specific >> type for this memory, you cannot be sure this actually is of >> the type drm_device. Basically you MUST not touch another >> driver's drvdata *at all*. >> >> The proper way to fix this would be to add some API to the DRM >> subsystem to query the information you are looking for form >> the DRM subsystem. >> >> Poking directly inside other driver's internals is NOT acceptable, >> NACK for this patch. >> > > I am inline with your remarks, but I could find a way where the > thermal driver registration, handling the backlight without having the > changes in the amdgpu driver very tricky. As mentioned above I think there should be generic thermal cooling device support added to drivers/video/backlight/backlight.c, then the amdgpu code just needs to pass a flag when registering the backlight to enable this. > Like you said, I am also of the same opinion that PMF driver should > call the DRM/GPU subsystem APIs (like it does with other subsystems). > > But Christian had concerns on adding all of these into the GPU driver. > So I had to roll back these into the PMF driver, and hence you see a > pci_get_drvdata() call. Ok, so I can see how this AMD PMF code is all kinda special and how the DRM folks don't want to have to add APIs just for that. But IMHO calling pci_get_drvdata() on an unowned device is completely unacceptable. At a minimum we need life-cycle management for the drm_device which the PMF code is using, something like: struct drm_device *drm_device_find(const void *data, int (*match)(struct drm_device *dev, const void *data)); which works similar to class_find_device() and returns a reference to the drm_device for which match has returned 0 (which also stops it from looping over devices). Combined with a: void drm_device_put(struct drm_device *dev); for when the PMF code is done with the device. This way the PMF code can safely get a reference to drm_device and release it when it is done. Rather then just getting some random pointer which may or not actually be a drm_device and the lifetime of which is not guaranteed in anyway. E.g. if the PMF driver loads before amdgpu then pci_get_drvdata() will just return NULL. And as mentioned if the amdgpu driver gets unbound after the PMF code has called pci_get_drvdata() the PMF driver now has a dangling pointer. So IMHO adding: drm_device_find() + drm_device_put() to the DRM core are minimum which is necessary here. If the PMF code then ends up doing things like drm_for_each_connector_iter() on the gotten drm_device referemce so be it. But we must make sure we have a properly lifecycle managed reference first and pci_get_drvdata() does not give us that. > For the sake of simplicity, I can drop patches 13/17 and 14/17 from > the series and send them separately later. Yes I think that dropping the GPU related patches for now would be best. Regards, Hans