[Public] + Alex D Alex, just FYI this was something that came to an AMD bug tracker and wanted you to be aware there are W/A going into nvidia-wmi-ec-backlight for some firmware problems with the mux. IIRC that was the original suspicion too on the bug reports. Comments inline as well. > -----Original Message----- > From: Daniel Dadap <ddadap@xxxxxxxxxx> > Sent: Wednesday, March 16, 2022 10:11 > To: Barnabás Pőcze <pobrn@xxxxxxxxxxxxxx> > Cc: platform-driver-x86@xxxxxxxxxxxxxxx; Alexandru Dinu > <alex.dinu07@xxxxxxxxx>; Hans de Goede <hdegoede@xxxxxxxxxx>; > markgross@xxxxxxxxxx > Subject: Re: [PATCH] nvidia-wmi-ec-backlight: Add workarounds for > confused firmware > > Thanks for the feedback. I'll send out a v2 shortly: Alex, can you > please retest when I do to make sure there aren't any regressions? None > of these suggestions affect the core flow of how either of the > workarounds work, so I'm not expecting any that wouldn't also reproduce > on my EC backlight system that doesn't have either of these problems, > but I can send you the updated version off-list first if you prefer. > > Detailed replies below: > > On 3/15/22 9:50 PM, Barnabás Pőcze wrote: > > Hi > > > > > > The platform-driver-x86 maintainers should've probably been > > CCd. You may or may not know, but the `scripts/get_maintainers.pl` > > script can be used to determine the appropriate recipients. > > > Indeed. I've copied the pdx86 maintainers on this message and will for > future correspondence regarding this patch. > > > > 2022. március 16., szerda 2:25 keltezéssel, Daniel Dadap írta: > >> Some notebook systems with EC-driven backlight control appear to have a > >> firmware bug which causes the system to use GPU-driven backlight > control > >> upon a fresh boot, but then switches to EC-driven backlight control > >> after completing a suspend/resume cycle. All the while, the firmware > >> reports that the backlight is under EC control, regardless of what is > >> actually controlling the backlight brightness. > >> > >> This leads to the following behavior: > >> > >> * nvidia-wmi-ec-backlight gets probed on a fresh boot, due to the > >> WMI-wrapped ACPI method erroneously reporting EC control. > >> * nvidia-wmi-ec-backlight does not work until after a suspend/resume > >> cycle, due to the backlight control actually being GPU-driven. > >> * GPU drivers also register their own backlight handlers: in the case > >> of the notebook system where this behavior has been observed, both > >> amdgpu and the NVIDIA proprietary driver register backlight handlers. > >> * The GPU which has backlight control upon a fresh boot (amdgpu in the > >> case observed so far) can successfully control the backlight through > >> its backlight driver's sysfs interface, but stops working after the > >> first suspend/resume cycle. > >> * nvidia-wmi-ec-backlight is unable to control the backlight upon a > >> fresh boot, but begins to work after the first suspend/resume cycle. > >> * The GPU which does not have backlight control (NVIDIA in this case) > >> is not able to control the backlight at any point while the system > >> is in operation. On similar hybrid systems with an EC-controlled > >> backlight, and AMD/NVIDIA iGPU/dGPU, the NVIDIA proprietary driver > >> does not register its backlight handler. It has not been determined > >> whether the non-functional handler registered by the NVIDIA driver > >> is due to another firmware bug, or a bug in the NVIDIA driver. > >> > >> Since nvidia-wmi-ec-backlight registers as a BACKLIGHT_FIRMWARE type > >> device, it takes precedence over the BACKLIGHT_RAW devices registered > >> by the GPU drivers. This in turn leads to backlight control appearing > >> to be non-functional until after completing a suspend/resume cycle. > >> However, it is still possible to control the backlight through direct > >> interaction with the working GPU driver's backlight sysfs interface. > >> > >> These systems also appear to have a second firmware bug which resets > >> the EC's brightness level to 100% on resume, but leaves the state in > >> the kernel at the pre-suspend level. This causes attempts to save > >> and restore the backlight level across the suspend/resume cycle to > >> fail, due to the level appearing not to change even though it did. > >> > >> In order to work around these issue, add quirk tables to detect > >> systems that are known to show these behaviors. So far, there is > >> only one known system that requires these workarounds, and both > >> issues are present on that system, but the quirks are tracked in > >> separate tables to make it easier to add them to other systems which > >> may exhibit one of the bugs, but not the other. The original systems > >> that this driver was tested on during development do not exhibit > >> either of these quirks. > >> > >> If a system with the "GPU driver has backlight control" quirk is > >> detected, nvidia-wmi-ec-backlight will grab a reference to the working > >> (when freshly booted) GPU backlight handler and relays any backlight > >> brightness level change requests directed at the EC to also be applied > >> to the GPU backlight interface. This leads to redundant updates > >> directed at the GPU backlight driver after a suspend/resume cycle, but > >> it does allow the EC backlight control to work when the system is > >> freshly booted. > >> > >> If a system with the "backlight level reset to full on resume" quirk > >> is detected, nvidia-wmi-ec-backlight will register a PM notifier to > >> reset the backlight to the previous level upon resume. > >> > >> These workarounds are also plumbed through to kernel module > parameters, > >> to make it easier for users who suspect they may be affected by one or > >> both of these bugs to test whether these workarounds are effective on > >> their systems as well. > >> > >> Signed-off-by: Daniel Dadap <ddadap@xxxxxxxxxx> > >> Tested-by: Alexandru Dinu <alex.dinu07@xxxxxxxxx> > >> --- > >> .../platform/x86/nvidia-wmi-ec-backlight.c | 181 > +++++++++++++++++- > >> 1 file changed, 179 insertions(+), 2 deletions(-) > >> > >> diff --git a/drivers/platform/x86/nvidia-wmi-ec-backlight.c > b/drivers/platform/x86/nvidia-wmi-ec-backlight.c > >> index 61e37194df70..ccb3b506c12c 100644 > >> --- a/drivers/platform/x86/nvidia-wmi-ec-backlight.c > >> +++ b/drivers/platform/x86/nvidia-wmi-ec-backlight.c > >> @@ -3,8 +3,11 @@ > >> * Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved. > >> */ > >> > >> +#define pr_fmt(f) "%s: " f "\n", KBUILD_MODNAME > > `KBUILD_MODNAME` is a string literal, so you can do e.g. > > > > #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > > > > > >> + > >> #include <linux/acpi.h> > >> #include <linux/backlight.h> > >> +#include <linux/dmi.h> > >> #include <linux/mod_devicetable.h> > >> #include <linux/module.h> > >> #include <linux/types.h> > >> @@ -75,6 +78,69 @@ struct wmi_brightness_args { > >> u32 ignored[3]; > >> }; > >> > >> +/** > >> + * struct nvidia_wmi_ec_backlight_priv - driver private data > >> + * @bl_dev: the associated backlight device > >> + * @proxy_target: backlight device which receives relayed brightness > changes > >> + * @notifier: notifier block for resume callback > >> + */ > >> +struct nvidia_wmi_ec_backlight_priv { > >> + struct backlight_device *bl_dev; > >> + struct backlight_device *proxy_target; > >> + struct notifier_block nb; > >> +}; > >> + > >> +static char *backlight_proxy_target; > >> +module_param(backlight_proxy_target, charp, 0); > > It seems these module parameters are neither readable nor writable, > > is that intentional? > > > It was intentional that they not be writable, because I didn't want to > have to plumb everything through to handle changing the values after > probe. However, you are right that it could still be useful to set up > the sysfs entries to allow reading the values, as this could be useful > information for someone who wants to check if either of these quirks are > enabled. > > > > > >> +MODULE_PARM_DESC(backlight_proxy_target, "Relay brightness > change requests to the named backlight driver, on systems which > erroneously report EC backlight control."); > >> + > >> +static int max_reprobe_attempts = 128; > > Can you elaborate how this number was arrived at? > > > > It's just a medium-small round number. I didn't want probe to return > -EPROBE_DEFER forever if e.g. somebody specified a wrong device name or > if the target device name changes and the entry in the quirks table goes > out of date. On the system I tested this on, the amdgpu_bl1 device was > accessible on the 14th probe attempt. If there's some better value to > plug in here, or if it's actually considered more correct to just never > succeed at probe if the workaround is enabled but the target device can > be found, I'd be happy to change it. > > > >> +module_param(max_reprobe_attempts, int, 0); > >> +MODULE_PARM_DESC(max_reprobe_attempts, "Limit of reprobe > attempts when relaying brightness change requests."); > >> + > >> +static bool restore_level_on_resume; > >> +module_param(restore_level_on_resume, bool, 0); > >> +MODULE_PARM_DESC(restore_level_on_resume, "Restore the > backlight level when resuming from suspend, on systems which reset the > EC's backlight level on resume."); > >> + > >> +static int assign_relay_quirk(const struct dmi_system_id *id) > >> +{ > >> + backlight_proxy_target = id->driver_data; > >> + return true; > >> +} > >> + > >> +#define PROXY_QUIRK_ENTRY(vendor, product, quirk_data) { \ > >> + .callback = assign_relay_quirk, \ > >> + .matches = { \ > >> + DMI_MATCH(DMI_SYS_VENDOR, vendor), \ > >> + DMI_MATCH(DMI_PRODUCT_VERSION, product) \ > >> + }, \ > >> + .driver_data = quirk_data \ > >> +} > >> + > >> +static const struct dmi_system_id proxy_quirk_table[] = { > >> + PROXY_QUIRK_ENTRY("LENOVO", "Legion S7 15ACH6", > "amdgpu_bl1"), > >> + { } > >> +}; > >> + > >> +static int assign_restore_quirk(const struct dmi_system_id *id) > >> +{ > >> + restore_level_on_resume = true; > >> + return true; > >> +} > >> + > >> +#define RESTORE_QUIRK_ENTRY(vendor, product) { \ > >> + .callback = assign_restore_quirk, \ > >> + .matches = { \ > >> + DMI_MATCH(DMI_SYS_VENDOR, vendor), \ > >> + DMI_MATCH(DMI_PRODUCT_VERSION, product), \ > >> + } \ > >> +} > >> + > >> +static const struct dmi_system_id restore_quirk_table[] = { > >> + RESTORE_QUIRK_ENTRY("LENOVO", "Legion S7 15ACH6"), > >> + { } > >> +}; > >> + > >> /** > >> * wmi_brightness_notify() - helper function for calling WMI-wrapped > ACPI method > >> * @w: Pointer to the struct wmi_device identified by > %WMI_BRIGHTNESS_GUID > >> @@ -119,9 +185,30 @@ static int wmi_brightness_notify(struct > wmi_device *w, enum wmi_brightness_metho > >> return 0; > >> } > >> > >> +static int scale_backlight_level(struct backlight_device *a, > >> + struct backlight_device *b) > >> +{ > >> + /* because floating point math in the kernel is annoying */ > >> + const int scaling_factor = 65536; > >> + int level = a->props.brightness; > >> + int relative_level = level * scaling_factor / a->props.max_brightness; > >> + > >> + return relative_level * b->props.max_brightness / scaling_factor; > >> +} > > Maybe > > > > fixp_linear_interpolate(0, 0, a->props.max_brightness, b- > >props.max_brightness, a->props.brightness); > > > > ? (from `linux/fixp-arith.h`) > > > Yes, this is exactly what I want; thank you. > > > > > >> + > >> static int nvidia_wmi_ec_backlight_update_status(struct > backlight_device *bd) > >> { > >> struct wmi_device *wdev = bl_get_data(bd); > >> + struct nvidia_wmi_ec_backlight_priv *priv = > dev_get_drvdata(&wdev->dev); > >> + struct backlight_device *proxy_target = priv->proxy_target; > >> + > >> + if (proxy_target) { > >> + int level = scale_backlight_level(bd, proxy_target); > >> + > >> + if (backlight_device_set_brightness(proxy_target, level)) > >> + pr_warn("Failed to relay backlight update to \"%s\"", > >> + backlight_proxy_target); > >> + } > >> > >> return wmi_brightness_notify(wdev, > WMI_BRIGHTNESS_METHOD_LEVEL, > >> WMI_BRIGHTNESS_MODE_SET, > >> @@ -147,13 +234,65 @@ static const struct backlight_ops > nvidia_wmi_ec_backlight_ops = { > >> .get_brightness = nvidia_wmi_ec_backlight_get_brightness, > >> }; > >> > >> +static int nvidia_wmi_ec_backlight_pm_notifier(struct notifier_block > *nb, unsigned long event, void *d) > >> +{ > >> + > >> + /* > >> + * On some systems, the EC backlight level gets reset to 100% when > >> + * resuming from suspend, but the backlight device state still reflects > >> + * the pre-suspend value. Refresh the existing state to sync the EC's > >> + * state back up with the kernel's. > >> + */ > >> + if (event == PM_POST_SUSPEND) { > >> + struct nvidia_wmi_ec_backlight_priv *p; > >> + > >> + p = container_of(nb, struct nvidia_wmi_ec_backlight_priv, > nb); > >> + return backlight_update_status(p->bl_dev); > > `backlight_update_status()` returns a negative errno while the notifier > chain > > expects something else. It would probably be better to return > `NOTIFY_DONE` > > in all cases. Currently a suitable error from `backlight_update_status()` will > > stop the notifier chain. > > > Thanks for catching that: I should have paid more attention to the > notifier callback signature. > > > > > >> + } > >> + > >> + return 0; > >> +} > >> + > >> static int nvidia_wmi_ec_backlight_probe(struct wmi_device *wdev, > const void *ctx) > >> { > >> + struct backlight_device *bdev, *target = NULL; > >> + struct nvidia_wmi_ec_backlight_priv *priv; > >> struct backlight_properties props = {}; > >> - struct backlight_device *bdev; > >> u32 source; > >> int ret; > >> > >> + /* > >> + * Check quirks tables to see if this system needs any of the firmware > >> + * bug workarounds. > >> + */ > >> + > >> + /* User-set quirks from the module parameters take precedence */ > >> + if (!backlight_proxy_target) > >> + dmi_check_system(proxy_quirk_table); > >> + > >> + dmi_check_system(restore_quirk_table); > >> + > >> + if (backlight_proxy_target && backlight_proxy_target[0]) { > >> + static int num_reprobe_attempts; > >> + > >> + target = > backlight_device_get_by_name(backlight_proxy_target); > >> + > >> + if (!target) { > >> + /* > >> + * The target backlight device might not be ready; > >> + * try again and disable backlight proxying if it > >> + * fails too many times. > >> + */ > >> + if (num_reprobe_attempts < > max_reprobe_attempts) { > >> + num_reprobe_attempts++; > >> + return -EPROBE_DEFER; > >> + } > >> + > >> + pr_warn("Unable to acquire %s after %d attempts. > Disabling backlight proxy.", > >> + backlight_proxy_target, > max_reprobe_attempts); > >> + } > >> + } > > I think `target` is not put in case of error. You probably need to add > something like: > > > > if (target) { > > ret = devm_add_action_or_reset(&wdev->dev, put_device_wrapper, > target); > > if (ret < 0) > > return ret; > > } > > > > > >> + > >> ret = wmi_brightness_notify(wdev, > WMI_BRIGHTNESS_METHOD_SOURCE, > >> WMI_BRIGHTNESS_MODE_GET, &source); > >> if (ret) > >> @@ -188,7 +327,44 @@ static int nvidia_wmi_ec_backlight_probe(struct > wmi_device *wdev, const void *ct > >> &wdev->dev, wdev, > >> &nvidia_wmi_ec_backlight_ops, > >> &props); > >> - return PTR_ERR_OR_ZERO(bdev); > >> + > >> + if (IS_ERR(bdev)) > >> + return PTR_ERR(bdev); > >> + > >> + priv = kzalloc(sizeof(*priv), GFP_KERNEL); > > `devm_kzalloc()` would probably be better and you should check if `!priv`. > > > > > >> + priv->bl_dev = bdev; > >> + > >> + dev_set_drvdata(&wdev->dev, priv); > >> + > >> + if (target) { > >> + int level = scale_backlight_level(target, bdev); > >> + > >> + if (backlight_device_set_brightness(bdev, level)) > >> + pr_warn("Unable to import initial brightness level > from %s.", > >> + backlight_proxy_target); > >> + priv->proxy_target = target; > >> + } > >> + > >> + if (restore_level_on_resume) { > >> + priv->nb.notifier_call = > nvidia_wmi_ec_backlight_pm_notifier; > >> + register_pm_notifier(&priv->nb); > >> + } > >> + > >> + return 0; > >> +} > >> + > >> +static void nvidia_wmi_ec_backlight_remove(struct wmi_device *wdev) > >> +{ > >> + struct nvidia_wmi_ec_backlight_priv *priv = > dev_get_drvdata(&wdev->dev); > >> + struct backlight_device *proxy_target = priv->proxy_target; > >> + > >> + if (proxy_target) > >> + put_device(&proxy_target->dev); > > If you switch to `devm_add_action_or_reset()`, this will not be needed. > > > > > >> + > >> + if (priv->nb.notifier_call) > >> + unregister_pm_notifier(&priv->nb); > >> + > >> + kfree(priv); > > If you switch to `devm_kzalloc()`, this won't be needed. > > > Thank you, the devm_*() variants are indeed useful. > > > > > >> } > >> > >> #define WMI_BRIGHTNESS_GUID "603E9613-EF25-4338-A3D0- > C46177516DB7" > >> @@ -204,6 +380,7 @@ static struct wmi_driver > nvidia_wmi_ec_backlight_driver = { > >> .name = "nvidia-wmi-ec-backlight", > >> }, > >> .probe = nvidia_wmi_ec_backlight_probe, > >> + .remove = nvidia_wmi_ec_backlight_remove, > >> .id_table = nvidia_wmi_ec_backlight_id_table, > >> }; > >> module_wmi_driver(nvidia_wmi_ec_backlight_driver); > >> -- > >> 2.27.0 > >> > > Lastly, is it expected that these bugs will be properly fixed? > > > Possibly, but I wouldn't hold out hope for it for an issue at this scale > on an already shipping system. This question I'm assuming was aimed at narrowing the quirk to only match certain FW versions or so. If there is no certainty of when/if it will be fixed I agree with current direction. However I think it's still worth at least noting near the quirk in a comment what firmware version it was identified. If later there is confirmation that a particular firmware version had fixed it the quirk can be adjusted to be dropped. > > > > > > Regards, > > Barnabás Pőcze