RE: [PATCH] nvidia-wmi-ec-backlight: Add workarounds for confused firmware

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[Public]

+ Alex D

Alex, just FYI this was something that came to an AMD bug tracker and wanted you to be aware there are W/A going into nvidia-wmi-ec-backlight for some firmware problems with the mux.
IIRC that was the original suspicion too on the bug reports.

Comments inline as well.

> -----Original Message-----
> From: Daniel Dadap <ddadap@xxxxxxxxxx>
> Sent: Wednesday, March 16, 2022 10:11
> To: Barnabás Pőcze <pobrn@xxxxxxxxxxxxxx>
> Cc: platform-driver-x86@xxxxxxxxxxxxxxx; Alexandru Dinu
> <alex.dinu07@xxxxxxxxx>; Hans de Goede <hdegoede@xxxxxxxxxx>;
> markgross@xxxxxxxxxx
> Subject: Re: [PATCH] nvidia-wmi-ec-backlight: Add workarounds for
> confused firmware
> 
> Thanks for the feedback. I'll send out a v2 shortly: Alex, can you
> please retest when I do to make sure there aren't any regressions? None
> of these suggestions affect the core flow of how either of the
> workarounds work, so I'm not expecting any that wouldn't also reproduce
> on my EC backlight system that doesn't have either of these problems,
> but I can send you the updated version off-list first if you prefer.
> 
> Detailed replies below:
> 
> On 3/15/22 9:50 PM, Barnabás Pőcze wrote:
> > Hi
> >
> >
> > The platform-driver-x86 maintainers should've probably been
> > CCd. You may or may not know, but the `scripts/get_maintainers.pl`
> > script can be used to determine the appropriate recipients.
> 
> 
> Indeed. I've copied the pdx86 maintainers on this message and will for
> future correspondence regarding this patch.
> 
> 
> > 2022. március 16., szerda 2:25 keltezéssel, Daniel Dadap írta:
> >> Some notebook systems with EC-driven backlight control appear to have a
> >> firmware bug which causes the system to use GPU-driven backlight
> control
> >> upon a fresh boot, but then switches to EC-driven backlight control
> >> after completing a suspend/resume cycle. All the while, the firmware
> >> reports that the backlight is under EC control, regardless of what is
> >> actually controlling the backlight brightness.
> >>
> >> This leads to the following behavior:
> >>
> >> * nvidia-wmi-ec-backlight gets probed on a fresh boot, due to the
> >>    WMI-wrapped ACPI method erroneously reporting EC control.
> >> * nvidia-wmi-ec-backlight does not work until after a suspend/resume
> >>    cycle, due to the backlight control actually being GPU-driven.
> >> * GPU drivers also register their own backlight handlers: in the case
> >>    of the notebook system where this behavior has been observed, both
> >>    amdgpu and the NVIDIA proprietary driver register backlight handlers.
> >> * The GPU which has backlight control upon a fresh boot (amdgpu in the
> >>    case observed so far) can successfully control the backlight through
> >>    its backlight driver's sysfs interface, but stops working after the
> >>    first suspend/resume cycle.
> >> * nvidia-wmi-ec-backlight is unable to control the backlight upon a
> >>    fresh boot, but begins to work after the first suspend/resume cycle.
> >> * The GPU which does not have backlight control (NVIDIA in this case)
> >>    is not able to control the backlight at any point while the system
> >>    is in operation. On similar hybrid systems with an EC-controlled
> >>    backlight, and AMD/NVIDIA iGPU/dGPU, the NVIDIA proprietary driver
> >>    does not register its backlight handler. It has not been determined
> >>    whether the non-functional handler registered by the NVIDIA driver
> >>    is due to another firmware bug, or a bug in the NVIDIA driver.
> >>
> >> Since nvidia-wmi-ec-backlight registers as a BACKLIGHT_FIRMWARE type
> >> device, it takes precedence over the BACKLIGHT_RAW devices registered
> >> by the GPU drivers. This in turn leads to backlight control appearing
> >> to be non-functional until after completing a suspend/resume cycle.
> >> However, it is still possible to control the backlight through direct
> >> interaction with the working GPU driver's backlight sysfs interface.
> >>
> >> These systems also appear to have a second firmware bug which resets
> >> the EC's brightness level to 100% on resume, but leaves the state in
> >> the kernel at the pre-suspend level. This causes attempts to save
> >> and restore the backlight level across the suspend/resume cycle to
> >> fail, due to the level appearing not to change even though it did.
> >>
> >> In order to work around these issue, add quirk tables to detect
> >> systems that are known to show these behaviors. So far, there is
> >> only one known system that requires these workarounds, and both
> >> issues are present on that system, but the quirks are tracked in
> >> separate tables to make it easier to add them to other systems which
> >> may exhibit one of the bugs, but not the other. The original systems
> >> that this driver was tested on during development do not exhibit
> >> either of these quirks.
> >>
> >> If a system with the "GPU driver has backlight control" quirk is
> >> detected, nvidia-wmi-ec-backlight will grab a reference to the working
> >> (when freshly booted) GPU backlight handler and relays any backlight
> >> brightness level change requests directed at the EC to also be applied
> >> to the GPU backlight interface. This leads to redundant updates
> >> directed at the GPU backlight driver after a suspend/resume cycle, but
> >> it does allow the EC backlight control to work when the system is
> >> freshly booted.
> >>
> >> If a system with the "backlight level reset to full on resume" quirk
> >> is detected, nvidia-wmi-ec-backlight will register a PM notifier to
> >> reset the backlight to the previous level upon resume.
> >>
> >> These workarounds are also plumbed through to kernel module
> parameters,
> >> to make it easier for users who suspect they may be affected by one or
> >> both of these bugs to test whether these workarounds are effective on
> >> their systems as well.
> >>
> >> Signed-off-by: Daniel Dadap <ddadap@xxxxxxxxxx>
> >> Tested-by: Alexandru Dinu <alex.dinu07@xxxxxxxxx>
> >> ---
> >>   .../platform/x86/nvidia-wmi-ec-backlight.c    | 181
> +++++++++++++++++-
> >>   1 file changed, 179 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/platform/x86/nvidia-wmi-ec-backlight.c
> b/drivers/platform/x86/nvidia-wmi-ec-backlight.c
> >> index 61e37194df70..ccb3b506c12c 100644
> >> --- a/drivers/platform/x86/nvidia-wmi-ec-backlight.c
> >> +++ b/drivers/platform/x86/nvidia-wmi-ec-backlight.c
> >> @@ -3,8 +3,11 @@
> >>    * Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.
> >>    */
> >>
> >> +#define pr_fmt(f) "%s: " f "\n", KBUILD_MODNAME
> > `KBUILD_MODNAME` is a string literal, so you can do e.g.
> >
> >    #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> >
> >
> >> +
> >>   #include <linux/acpi.h>
> >>   #include <linux/backlight.h>
> >> +#include <linux/dmi.h>
> >>   #include <linux/mod_devicetable.h>
> >>   #include <linux/module.h>
> >>   #include <linux/types.h>
> >> @@ -75,6 +78,69 @@ struct wmi_brightness_args {
> >>   	u32 ignored[3];
> >>   };
> >>
> >> +/**
> >> + * struct nvidia_wmi_ec_backlight_priv - driver private data
> >> + * @bl_dev:       the associated backlight device
> >> + * @proxy_target: backlight device which receives relayed brightness
> changes
> >> + * @notifier:     notifier block for resume callback
> >> + */
> >> +struct nvidia_wmi_ec_backlight_priv {
> >> +	struct backlight_device *bl_dev;
> >> +	struct backlight_device *proxy_target;
> >> +	struct notifier_block nb;
> >> +};
> >> +
> >> +static char *backlight_proxy_target;
> >> +module_param(backlight_proxy_target, charp, 0);
> > It seems these module parameters are neither readable nor writable,
> > is that intentional?
> 
> 
> It was intentional that they not be writable, because I didn't want to
> have to plumb everything through to handle changing the values after
> probe. However, you are right that it could still be useful to set up
> the sysfs entries to allow reading the values, as this could be useful
> information for someone who wants to check if either of these quirks are
> enabled.
> 
> 
> >
> >> +MODULE_PARM_DESC(backlight_proxy_target, "Relay brightness
> change requests to the named backlight driver, on systems which
> erroneously report EC backlight control.");
> >> +
> >> +static int max_reprobe_attempts = 128;
> > Can you elaborate how this number was arrived at?
> >
> 
> It's just a medium-small round number. I didn't want probe to return
> -EPROBE_DEFER forever if e.g. somebody specified a wrong device name or
> if the target device name changes and the entry in the quirks table goes
> out of date. On the system I tested this on, the amdgpu_bl1 device was
> accessible on the 14th probe attempt. If there's some better value to
> plug in here, or if it's actually considered more correct to just never
> succeed at probe if the workaround is enabled but the target device can
> be found, I'd be happy to change it.
> 
> 
> >> +module_param(max_reprobe_attempts, int, 0);
> >> +MODULE_PARM_DESC(max_reprobe_attempts, "Limit of reprobe
> attempts when relaying brightness change requests.");
> >> +
> >> +static bool restore_level_on_resume;
> >> +module_param(restore_level_on_resume, bool, 0);
> >> +MODULE_PARM_DESC(restore_level_on_resume, "Restore the
> backlight level when resuming from suspend, on systems which reset the
> EC's backlight level on resume.");
> >> +
> >> +static int assign_relay_quirk(const struct dmi_system_id *id)
> >> +{
> >> +	backlight_proxy_target = id->driver_data;
> >> +	return true;
> >> +}
> >> +
> >> +#define PROXY_QUIRK_ENTRY(vendor, product, quirk_data) { \
> >> +	.callback = assign_relay_quirk,                  \
> >> +	.matches = {                                     \
> >> +		DMI_MATCH(DMI_SYS_VENDOR, vendor),       \
> >> +		DMI_MATCH(DMI_PRODUCT_VERSION, product)  \
> >> +	},                                               \
> >> +	.driver_data = quirk_data                        \
> >> +}
> >> +
> >> +static const struct dmi_system_id proxy_quirk_table[] = {
> >> +	PROXY_QUIRK_ENTRY("LENOVO", "Legion S7 15ACH6",
> "amdgpu_bl1"),
> >> +	{ }
> >> +};
> >> +
> >> +static int assign_restore_quirk(const struct dmi_system_id *id)
> >> +{
> >> +	restore_level_on_resume = true;
> >> +	return true;
> >> +}
> >> +
> >> +#define RESTORE_QUIRK_ENTRY(vendor, product) {           \
> >> +	.callback = assign_restore_quirk,                \
> >> +	.matches = {                                     \
> >> +		DMI_MATCH(DMI_SYS_VENDOR, vendor),       \
> >> +		DMI_MATCH(DMI_PRODUCT_VERSION, product), \
> >> +	}                                                \
> >> +}
> >> +
> >> +static const struct dmi_system_id restore_quirk_table[] = {
> >> +	RESTORE_QUIRK_ENTRY("LENOVO", "Legion S7 15ACH6"),
> >> +	{ }
> >> +};
> >> +
> >>   /**
> >>    * wmi_brightness_notify() - helper function for calling WMI-wrapped
> ACPI method
> >>    * @w:    Pointer to the struct wmi_device identified by
> %WMI_BRIGHTNESS_GUID
> >> @@ -119,9 +185,30 @@ static int wmi_brightness_notify(struct
> wmi_device *w, enum wmi_brightness_metho
> >>   	return 0;
> >>   }
> >>
> >> +static int scale_backlight_level(struct backlight_device *a,
> >> +				 struct backlight_device *b)
> >> +{
> >> +	/* because floating point math in the kernel is annoying */
> >> +	const int scaling_factor = 65536;
> >> +	int level = a->props.brightness;
> >> +	int relative_level = level * scaling_factor / a->props.max_brightness;
> >> +
> >> +	return relative_level * b->props.max_brightness / scaling_factor;
> >> +}
> > Maybe
> >
> >    fixp_linear_interpolate(0, 0, a->props.max_brightness, b-
> >props.max_brightness, a->props.brightness);
> >
> > ? (from `linux/fixp-arith.h`)
> 
> 
> Yes, this is exactly what I want; thank you.
> 
> 
> >
> >> +
> >>   static int nvidia_wmi_ec_backlight_update_status(struct
> backlight_device *bd)
> >>   {
> >>   	struct wmi_device *wdev = bl_get_data(bd);
> >> +	struct nvidia_wmi_ec_backlight_priv *priv =
> dev_get_drvdata(&wdev->dev);
> >> +	struct backlight_device *proxy_target = priv->proxy_target;
> >> +
> >> +	if (proxy_target) {
> >> +		int level = scale_backlight_level(bd, proxy_target);
> >> +
> >> +		if (backlight_device_set_brightness(proxy_target, level))
> >> +			pr_warn("Failed to relay backlight update to \"%s\"",
> >> +				backlight_proxy_target);
> >> +	}
> >>
> >>   	return wmi_brightness_notify(wdev,
> WMI_BRIGHTNESS_METHOD_LEVEL,
> >>   	                             WMI_BRIGHTNESS_MODE_SET,
> >> @@ -147,13 +234,65 @@ static const struct backlight_ops
> nvidia_wmi_ec_backlight_ops = {
> >>   	.get_brightness = nvidia_wmi_ec_backlight_get_brightness,
> >>   };
> >>
> >> +static int nvidia_wmi_ec_backlight_pm_notifier(struct notifier_block
> *nb, unsigned long event, void *d)
> >> +{
> >> +
> >> +	/*
> >> +	 * On some systems, the EC backlight level gets reset to 100% when
> >> +	 * resuming from suspend, but the backlight device state still reflects
> >> +	 * the pre-suspend value. Refresh the existing state to sync the EC's
> >> +	 * state back up with the kernel's.
> >> +	 */
> >> +	if (event == PM_POST_SUSPEND) {
> >> +		struct nvidia_wmi_ec_backlight_priv *p;
> >> +
> >> +		p = container_of(nb, struct nvidia_wmi_ec_backlight_priv,
> nb);
> >> +		return backlight_update_status(p->bl_dev);
> > `backlight_update_status()` returns a negative errno while the notifier
> chain
> > expects something else. It would probably be better to return
> `NOTIFY_DONE`
> > in all cases. Currently a suitable error from `backlight_update_status()` will
> > stop the notifier chain.
> 
> 
> Thanks for catching that: I should have paid more attention to the
> notifier callback signature.
> 
> 
> >
> >> +	}
> >> +
> >> +	return 0;
> >> +}
> >> +
> >>   static int nvidia_wmi_ec_backlight_probe(struct wmi_device *wdev,
> const void *ctx)
> >>   {
> >> +	struct backlight_device *bdev, *target = NULL;
> >> +	struct nvidia_wmi_ec_backlight_priv *priv;
> >>   	struct backlight_properties props = {};
> >> -	struct backlight_device *bdev;
> >>   	u32 source;
> >>   	int ret;
> >>
> >> +	/*
> >> +	 * Check quirks tables to see if this system needs any of the firmware
> >> +	 * bug workarounds.
> >> +	 */
> >> +
> >> +	/* User-set quirks from the module parameters take precedence */
> >> +	if (!backlight_proxy_target)
> >> +		dmi_check_system(proxy_quirk_table);
> >> +
> >> +	dmi_check_system(restore_quirk_table);
> >> +
> >> +	if (backlight_proxy_target && backlight_proxy_target[0]) {
> >> +		static int num_reprobe_attempts;
> >> +
> >> +		target =
> backlight_device_get_by_name(backlight_proxy_target);
> >> +
> >> +		if (!target) {
> >> +			/*
> >> +			 * The target backlight device might not be ready;
> >> +			 * try again and disable backlight proxying if it
> >> +			 * fails too many times.
> >> +			 */
> >> +			if (num_reprobe_attempts <
> max_reprobe_attempts) {
> >> +				num_reprobe_attempts++;
> >> +				return -EPROBE_DEFER;
> >> +			}
> >> +
> >> +			pr_warn("Unable to acquire %s after %d attempts.
> Disabling backlight proxy.",
> >> +				backlight_proxy_target,
> max_reprobe_attempts);
> >> +		}
> >> +	}
> > I think `target` is not put in case of error. You probably need to add
> something like:
> >
> >    if (target) {
> >      ret = devm_add_action_or_reset(&wdev->dev, put_device_wrapper,
> target);
> >      if (ret < 0)
> >        return ret;
> >    }
> >
> >
> >> +
> >>   	ret = wmi_brightness_notify(wdev,
> WMI_BRIGHTNESS_METHOD_SOURCE,
> >>   	                           WMI_BRIGHTNESS_MODE_GET, &source);
> >>   	if (ret)
> >> @@ -188,7 +327,44 @@ static int nvidia_wmi_ec_backlight_probe(struct
> wmi_device *wdev, const void *ct
> >>   					      &wdev->dev, wdev,
> >>   					      &nvidia_wmi_ec_backlight_ops,
> >>   					      &props);
> >> -	return PTR_ERR_OR_ZERO(bdev);
> >> +
> >> +	if (IS_ERR(bdev))
> >> +		return PTR_ERR(bdev);
> >> +
> >> +	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
> > `devm_kzalloc()` would probably be better and you should check if `!priv`.
> >
> >
> >> +	priv->bl_dev = bdev;
> >> +
> >> +	dev_set_drvdata(&wdev->dev, priv);
> >> +
> >> +	if (target) {
> >> +		int level = scale_backlight_level(target, bdev);
> >> +
> >> +		if (backlight_device_set_brightness(bdev, level))
> >> +			pr_warn("Unable to import initial brightness level
> from %s.",
> >> +				backlight_proxy_target);
> >> +		priv->proxy_target = target;
> >> +	}
> >> +
> >> +	if (restore_level_on_resume) {
> >> +		priv->nb.notifier_call =
> nvidia_wmi_ec_backlight_pm_notifier;
> >> +		register_pm_notifier(&priv->nb);
> >> +	}
> >> +
> >> +	return 0;
> >> +}
> >> +
> >> +static void nvidia_wmi_ec_backlight_remove(struct wmi_device *wdev)
> >> +{
> >> +	struct nvidia_wmi_ec_backlight_priv *priv =
> dev_get_drvdata(&wdev->dev);
> >> +	struct backlight_device *proxy_target = priv->proxy_target;
> >> +
> >> +	if (proxy_target)
> >> +		put_device(&proxy_target->dev);
> > If you switch to `devm_add_action_or_reset()`, this will not be needed.
> >
> >
> >> +
> >> +	if (priv->nb.notifier_call)
> >> +		unregister_pm_notifier(&priv->nb);
> >> +
> >> +	kfree(priv);
> > If you switch to `devm_kzalloc()`, this won't be needed.
> 
> 
> Thank you, the devm_*() variants are indeed useful.
> 
> 
> >
> >>   }
> >>
> >>   #define WMI_BRIGHTNESS_GUID "603E9613-EF25-4338-A3D0-
> C46177516DB7"
> >> @@ -204,6 +380,7 @@ static struct wmi_driver
> nvidia_wmi_ec_backlight_driver = {
> >>   		.name = "nvidia-wmi-ec-backlight",
> >>   	},
> >>   	.probe = nvidia_wmi_ec_backlight_probe,
> >> +	.remove = nvidia_wmi_ec_backlight_remove,
> >>   	.id_table = nvidia_wmi_ec_backlight_id_table,
> >>   };
> >>   module_wmi_driver(nvidia_wmi_ec_backlight_driver);
> >> --
> >> 2.27.0
> >>
> > Lastly, is it expected that these bugs will be properly fixed?
> 
> 
> Possibly, but I wouldn't hold out hope for it for an issue at this scale
> on an already shipping system.

This question I'm assuming was aimed at narrowing the quirk to only
match certain FW versions or so.  If there is no certainty of when/if it
will be fixed I agree with current direction.
However I think it's still worth at least noting near the quirk in a comment
what firmware version it was identified.  If later there is confirmation that
a particular firmware version had fixed it the quirk can be adjusted to be
dropped.

> 
> 
> >
> > Regards,
> > Barnabás Pőcze




[Index of Archives]     [Linux Kernel Development]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux