Attach a iommu [DMAR] fault handler for our device and try and reset the GPU upon a fault. At worst this will allow us to more quickly recover from a fault, rather than wait 10s for the hangcheck to determine a stuctk GPU. At best, it will immediately restart the GPU and paper over the bad iommu. Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> --- drivers/gpu/drm/i915/i915_drv.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index f2389ba49c69..f881de6e4583 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -501,6 +501,22 @@ static int i915_set_dma_info(struct drm_i915_private *i915) return ret; } +static int fault_handler(struct iommu_fault *f, void *arg) +{ + struct drm_i915_private *i915 = arg; + + intel_gt_handle_error(&i915->gt, ALL_ENGINES, 0, "DMAR fault"); + + /* + * If we successfully handle the fault, eg mapping a new page, + * we should call iommu_page_response(). + * + * We make no attempt to resolve the cause of the fault, as it + * should only be from misconfiguration of the iommu device itself. + */ + return 0; +} + /** * i915_driver_hw_probe - setup state requiring device access * @dev_priv: device private @@ -621,6 +637,9 @@ static int i915_driver_hw_probe(struct drm_i915_private *dev_priv) intel_bw_init_hw(dev_priv); + iommu_register_device_fault_handler(dev_priv->drm.dev, + fault_handler, dev_priv); + return 0; err_msi: @@ -644,6 +663,8 @@ static void i915_driver_hw_remove(struct drm_i915_private *dev_priv) { struct pci_dev *pdev = dev_priv->drm.pdev; + iommu_unregister_device_fault_handler(dev_priv->drm.dev); + i915_perf_fini(dev_priv); if (pdev->msi_enabled) -- 2.20.1 _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx