[PATCH v3 1/4] drm/i915: Skip harmful unregister steps if not registered

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Starting with commit ec3e00b4ee27 ("drm/i915: stop registering if
drm_dev_register() fails"), we return from i915_driver_register()
immediately if drm_dev_register() fails, skipping remaining registration
steps.  However, the _unregister() counterpart called at device remove
knows nothing about that skip and executes reverts of all those steps,
some of those reverts possibly added or modified later.  As a consequence,
a number of kernel warnings that taint the kernel are triggered:

<3> [525.823143] i915 0000:00:02.0: [drm] *ERROR* Failed to register driver for
userspace access!
...
<4> [525.831069] ------------[ cut here ]------------
<4> [525.831071] i915 0000:00:02.0: [drm] drm_WARN_ON(power_domains->init_wakere
f)
<4> [525.831095] WARNING: CPU: 6 PID: 3440 at drivers/gpu/drm/i915/display/intel
_display_power.c:2074 intel_power_domains_disable+0xc2/0xd0 [i915]
...
<4> [525.831328] CPU: 6 UID: 0 PID: 3440 Comm: i915_module_loa Tainted: G     U
            6.14.0-rc1-CI_DRM_16076-g7a632b6798b6+ #1
...
<4> [525.831334] RIP: 0010:intel_power_domains_disable+0xc2/0xd0 [i915]
...
<4> [525.831483] Call Trace:
<4> [525.831484]  <TASK>
...
<4> [525.831943]  i915_driver_remove+0x4b/0x140 [i915]
<4> [525.832028]  i915_pci_remove+0x1e/0x40 [i915]
<4> [525.832099]  pci_device_remove+0x3e/0xb0
<4> [525.832103]  device_remove+0x40/0x80
<4> [525.832107]  device_release_driver_internal+0x215/0x280
...
<4> [525.947666] ------------[ cut here ]------------
<4> [525.947669] kobject: '(null)' (ffff88814f62a218): is not initialized, yet kobject_put() is being called.
<4> [525.947707] WARNING: CPU: 6 PID: 3440 at lib/kobject.c:734 kobject_put+0xe4/0x200
...
<4> [525.947875] RIP: 0010:kobject_put+0xe4/0x200
...
<4> [525.947909] Call Trace:
<4> [525.947911]  <TASK>
...
<4> [525.947963]  intel_gt_sysfs_unregister+0x25/0x40 [i915]
<4> [525.948133]  intel_gt_driver_unregister+0x14/0x80 [i915]
<4> [525.948291]  i915_driver_remove+0x6c/0x140 [i915]
<4> [525.948411]  i915_pci_remove+0x1e/0x40 [i915]
...
<4> [526.441186] ------------[ cut here ]------------
<4> [526.441191] kernfs: can not remove 'error', no directory
<4> [526.441211] WARNING: CPU: 1 PID: 3440 at fs/kernfs/dir.c:1684 kernfs_remove_by_name_ns+0xbc/0xc0
...
<4> [526.441536] RIP: 0010:kernfs_remove_by_name_ns+0xbc/0xc0
...
<4> [526.441578] Call Trace:
<4> [526.441581]  <TASK>
...
<4> [526.441686]  sysfs_remove_bin_file+0x17/0x30
<4> [526.441691]  i915_gpu_error_sysfs_teardown+0x1d/0x30 [i915]
<4> [526.442226]  i915_teardown_sysfs+0x1c/0x60 [i915]
<4> [526.442369]  i915_driver_remove+0x9d/0x140 [i915]
<4> [526.442473]  i915_pci_remove+0x1e/0x40 [i915]
...
<4> [526.685700] ------------[ cut here ]------------
<4> [526.685706] i915 0000:00:02.0: [drm] i915 raw-wakerefs=1 wakelocks=1 on cle
anup
<4> [526.685734] WARNING: CPU: 1 PID: 3440 at drivers/gpu/drm/i915/intel_runtime
_pm.c:443 intel_runtime_pm_driver_release+0x75/0x90 [i915]
...
<4> [526.686090] RIP: 0010:intel_runtime_pm_driver_release+0x75/0x90 [i915]
...
<4> [526.686294] Call Trace:
<4> [526.686296]  <TASK>
...
<4> [526.687025]  i915_driver_release+0x7e/0xb0 [i915]
<4> [526.687243]  drm_dev_put.part.0+0x47/0x90
<4> [526.687250]  devm_drm_dev_init_release+0x13/0x30
<4> [526.687255]  devm_action_release+0x12/0x30
<4> [526.687261]  release_nodes+0x3a/0x120
<4> [526.687268]  devres_release_all+0x97/0xe0
<4> [526.687277]  device_unbind_cleanup+0x12/0x80
<4> [526.687282]  device_release_driver_internal+0x23a/0x280
...

A call to intel_power_domains_disable() that triggeres the drm_WARN_ON()
and takes another wakeref even if the one taken during driver register was
not released after device register error, was already there.  The WARN()
triggered by kernfs_remove_by_name_ns() from i915_teardown_sysfs()->
i915_gpu_error_sysfs_teardown(), formerly i915_teardown_error_capture(),
was also there.  A call to intel_gt_sysfs_unregister() that triggers the
WARN() from kobject_put() was added to intel_gt_driver_unregister() with
commit 69d6bf5c3754ff ("drm/i915/gt: Fix memory leaks in per-gt sysfs").

Introduce a flag that indicates device registration status and raise it on
device registration success.  As a minimum (first step), when that flag is
found not set while unregistering the driver, jump over those reverts of
registration steps omitted after device registration failure that are not
ready for being called unconditionally (and trigger kernel warnings).

In case of i915_gt_driver_unregister() called for each GT, omitting it
proved to introduce new issues.  Skip only its problematic
intel_gt_sysfs_unregister() sub-step.

Other unregister steps seem to be safe but may still occur redundant in
that scenario -- that will be addressed in follow-up patches.

v3: Based on Andi's commitment on introducing a flag, try to address
    Jani's "must find another way" by finding a better place and name for
    the flag (in hope that's what Jani had on mind),
  - split into a series of patches and limit the scope of the first (this)
    one to a minimum of omitting conditionally only those unregister
    (sub)steps that trigger kernel warnings when not registered.
v2: Check in _unregister whether the drm_dev_register has succeeded and
    skip some of the _unregister() steps. (Andi)

Link: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10047
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/9820
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10131
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10887
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12817
Cc: Lucas De Marchi <lucas.demarchi@xxxxxxxxx>
Cc: Chris Wilson <chris.p.wilson@xxxxxxxxxxxxxxx>
Cc: Ashutosh Dixit <ashutosh.dixit@xxxxxxxxx>
Cc: Andi Shyti <andi.shyti@xxxxxxxxxxxxxxx>
Cc: Jani Nikula <jani.nikula@xxxxxxxxxxxxxxx>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@xxxxxxxxxxxxxxx>
---
 drivers/gpu/drm/i915/gt/intel_gt.c | 3 ++-
 drivers/gpu/drm/i915/i915_driver.c | 8 ++++++--
 drivers/gpu/drm/i915/i915_drv.h    | 2 ++
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index 3d3b1ba76e2be..77eb9305a1ff8 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -783,7 +783,8 @@ void intel_gt_driver_unregister(struct intel_gt *gt)
 {
 	intel_wakeref_t wakeref;
 
-	intel_gt_sysfs_unregister(gt);
+	if (gt->i915->do_unregister)
+		intel_gt_sysfs_unregister(gt);
 	intel_rps_driver_unregister(&gt->rps);
 	intel_gsc_fini(&gt->gsc);
 
diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
index 613084fd00979..2caaeeab1f0f6 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -638,6 +638,8 @@ static void i915_driver_register(struct drm_i915_private *dev_priv)
 		return;
 	}
 
+	dev_priv->do_unregister = true;
+
 	i915_debugfs_register(dev_priv);
 	i915_setup_sysfs(dev_priv);
 
@@ -673,7 +675,8 @@ static void i915_driver_unregister(struct drm_i915_private *dev_priv)
 	i915_switcheroo_unregister(dev_priv);
 
 	intel_runtime_pm_disable(&dev_priv->runtime_pm);
-	intel_power_domains_disable(display);
+	if (dev_priv->do_unregister)
+		intel_power_domains_disable(display);
 
 	intel_display_driver_unregister(display);
 
@@ -687,7 +690,8 @@ static void i915_driver_unregister(struct drm_i915_private *dev_priv)
 	i915_perf_unregister(dev_priv);
 	i915_pmu_unregister(dev_priv);
 
-	i915_teardown_sysfs(dev_priv);
+	if (dev_priv->do_unregister)
+		i915_teardown_sysfs(dev_priv);
 	drm_dev_unplug(&dev_priv->drm);
 
 	i915_gem_driver_unregister(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index ffc346379cc2c..0c6cbedaa71fe 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -184,6 +184,8 @@ struct drm_i915_private {
 	/* FIXME: Device release actions should all be moved to drmm_ */
 	bool do_release;
 
+	bool do_unregister;
+
 	/* i915 device parameters */
 	struct i915_params params;
 
-- 
2.48.1




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux