On 04/08/2022 00:03, Stuart Summers wrote:
In the driver teardown, we are unregistering the gt prior
to unregistering the PMU. This means there is a small window
of time in which the application can request metrics from the
PMU, some of which are calling into the uapi engines list,
while the engines are not available. In this case we can
see null pointer dereferences.
Fix this ordering in both the driver load and unload sequences.
v1: Actually address the driver load/unload ordering issue
v2: Remove the NULL checks since they shouldn't be necessary
now with the proper ordering
Fixes: 42014f69bb235 ("drm/i915: Hook up GT power management")
What happened in this commit to break it?
Signed-off-by: Stuart Summers <stuart.summers@xxxxxxxxx>
---
drivers/gpu/drm/i915/i915_driver.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
index deb8a8b76965a..ee4dcb85d2060 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -717,7 +717,6 @@ static void i915_driver_register(struct drm_i915_private *dev_priv)
struct drm_device *dev = &dev_priv->drm;
i915_gem_driver_register(dev_priv);
- i915_pmu_register(dev_priv);
intel_vgpu_register(dev_priv);
@@ -731,11 +730,12 @@ static void i915_driver_register(struct drm_i915_private *dev_priv)
i915_debugfs_register(dev_priv);
i915_setup_sysfs(dev_priv);
+ intel_gt_driver_register(to_gt(dev_priv));
+
/* Depends on sysfs having been initialized */
+ i915_pmu_register(dev_priv);
i915_perf_register(dev_priv);
- intel_gt_driver_register(to_gt(dev_priv));
-
There was a bit of a time distance since we originally discussed this so
things kind of evaporated from my head. Or at least it feels like that.
Today when I look at the code I don't understand why is this shuffle
relevant.
The sysfs comment does not really apply to pmu, but also if I look into
intel_gt_driver_(un)register I did not spot what is the relevant part
which interacts with the PMU.
On register it is engine list first then PMU.
On unregister it is PMU first, then engine list:
i915_driver_remove
i915_driver_unregister
i915_pmu_unregister
i915_gem_driver_remove
clears uabi engines list
Help please? :)
Regards,
Tvrtko
intel_display_driver_register(dev_priv);
intel_power_domains_enable(dev_priv);
@@ -762,11 +762,12 @@ static void i915_driver_unregister(struct drm_i915_private *dev_priv)
intel_display_driver_unregister(dev_priv);
- intel_gt_driver_unregister(to_gt(dev_priv));
-
i915_perf_unregister(dev_priv);
+ /* GT should be available until PMU is gone */
i915_pmu_unregister(dev_priv);
+ intel_gt_driver_unregister(to_gt(dev_priv));
+
i915_teardown_sysfs(dev_priv);
drm_dev_unplug(&dev_priv->drm);