Re: [PATCH v5 3/3] drm/i915: Fix harmful driver register/unregister asymmetry

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Janusz,

On 2025-03-14 at 21:38:35 GMT, Janusz Krzysztofik wrote:
> Starting with commit ec3e00b4ee27 ("drm/i915: stop registering if
> drm_dev_register() fails"), we return from i915_driver_register()
> immediately if drm_dev_register() fails, skipping remaining registration
> steps, and continue only with remaining probe steps.  However, the
> _unregister() counterpart called at driver remove knows nothing about that
> skip and executes reverts of all those steps.  As a consequence, a number
> of kernel warnings that taint the kernel are triggered:
> 
> <3> [525.823143] i915 0000:00:02.0: [drm] *ERROR* Failed to register driver for userspace access!
> ...
> <4> [525.831069] ------------[ cut here ]------------
> <4> [525.831071] i915 0000:00:02.0: [drm] drm_WARN_ON(power_domains->init_wakeref)
> <4> [525.831095] WARNING: CPU: 6 PID: 3440 at drivers/gpu/drm/i915/display/intel_display_power.c:2074 intel_power_domains_disable+0xc2/0xd0 [i915]
> ...
> <4> [525.831328] CPU: 6 UID: 0 PID: 3440 Comm: i915_module_loa Tainted: G     U             6.14.0-rc1-CI_DRM_16076-g7a632b6798b6+ #1
> ...
> <4> [525.831334] RIP: 0010:intel_power_domains_disable+0xc2/0xd0 [i915]
> ...
> <4> [525.831483] Call Trace:
> <4> [525.831484]  <TASK>
> ...
> <4> [525.831943]  i915_driver_remove+0x4b/0x140 [i915]
> <4> [525.832028]  i915_pci_remove+0x1e/0x40 [i915]
> <4> [525.832099]  pci_device_remove+0x3e/0xb0
> <4> [525.832103]  device_remove+0x40/0x80
> <4> [525.832107]  device_release_driver_internal+0x215/0x280
> ...
> 
> Moreover, that unexpected PM reference is left untouched (not released)
> but overwritten, then that triggers another kernel warning at driver
> release phase:
> 
> <4> [526.685700] ------------[ cut here ]------------
> <4> [526.685706] i915 0000:00:02.0: [drm] i915 raw-wakerefs=1 wakelocks=1 on cleanup
> <4> [526.685734] WARNING: CPU: 1 PID: 3440 at drivers/gpu/drm/i915/intel_runtime_pm.c:443 intel_runtime_pm_driver_release+0x75/0x90 [i915]
> ...
> <4> [526.686090] RIP: 0010:intel_runtime_pm_driver_release+0x75/0x90 [i915]
> ...
> <4> [526.686294] Call Trace:
> <4> [526.686296]  <TASK>
> ...
> <4> [526.687025]  i915_driver_release+0x7e/0xb0 [i915]
> <4> [526.687243]  drm_dev_put.part.0+0x47/0x90
> <4> [526.687250]  devm_drm_dev_init_release+0x13/0x30
> <4> [526.687255]  devm_action_release+0x12/0x30
> <4> [526.687261]  release_nodes+0x3a/0x120
> <4> [526.687268]  devres_release_all+0x97/0xe0
> <4> [526.687277]  device_unbind_cleanup+0x12/0x80
> <4> [526.687282]  device_release_driver_internal+0x23a/0x280
> ...
> 
> A call to intel_power_domains_disable() was already there.  It triggers
> the drm_WARN_ON() when it finds a reference to a wakeref taken on device
> probe and not released after device registration failure.  That wakeref is
> then left held forever once its handle gets lost overwritten with another
> wakeref, hence another WARN() is called from
> intel_runtime_pm_driver_release().
> 
> The WARN() triggered by kernfs_remove_by_name_ns() from
> i915_teardown_sysfs()->i915_gpu_error_sysfs_teardown(), formerly
> i915_teardown_error_capture(), was also there when the return was added.
> 
> A call to intel_gt_sysfs_unregister() that triggers the WARN() from
> kobject_put() was added to intel_gt_driver_unregister() with commit
> 69d6bf5c3754ff ("drm/i915/gt: Fix memory leaks in per-gt sysfs").
> 
> Fix the asymmetry by failing the driver probe on device registration
> failure and going through rewind paths.
> 
> For that to work as expected, we apparently need to start the rewind path
> of i915_driver_register() with drm_dev_unregister(), even if
> drm_dev_register() returned an error.
> 
> v5: Drop unsigned keyword from ret variable declaration (Krzysztof),
>   - keep the "Failed to register driver for userspace access" error
>     message (Krzysztof),
>   - split PXP cleanup addition to rewind path out to a separate patch.
> v4: Switch to taking an error rewind path on device registration failure
>     (Krzysztof, Lucas).
> v3: Based on Andi's commitment on introducing a flag, try to address
>     Jani's "must find another way" by finding a better place and name for
>     the flag (in hope that's what Jani had on mind),
>   - split into a series of patches and limit the scope of the first (this)
>     one to a minimum of omitting conditionally only those unregister
>     (sub)steps that trigger kernel warnings when not registered.
> v2: Check in _unregister whether the drm_dev_register has succeeded and
>     skip some of the _unregister() steps. (Andi)
> 
> Link: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10047
> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10131
> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10887
> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12817
> Cc: Lucas De Marchi <lucas.demarchi@xxxxxxxxx>
> Cc: Chris Wilson <chris.p.wilson@xxxxxxxxxxxxxxx>
> Cc: Ashutosh Dixit <ashutosh.dixit@xxxxxxxxx>
> Cc: Andi Shyti <andi.shyti@xxxxxxxxxxxxxxx>
> Cc: Krzysztof Niemiec <krzysztof.niemiec@xxxxxxxxx>
> Cc: Jani Nikula <jani.nikula@xxxxxxxxxxxxxxx>
> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@xxxxxxxxxxxxxxx>

Reviewed-by: Krzysztof Niemiec <krzysztof.niemiec@xxxxxxxxx>

Thanks
Krzysztof



[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux