On Fri, Feb 12, 2021 at 01:19:25PM -0800, Matt Roper wrote:
In rare circumstances bugs in PCI programming, broken BIOS, or failing hardware can cause the CPU to lose access to the MMIO BAR on dgfx platforms. This is a pretty catastrophic failure since all register reads come back with values of 0xFFFFFFFF. Let's check for this special case while doing our usual checks for unclaimed registers; the FPGA_DBG register we use for those checks on modern platforms has some unused bits that will always read back as 0 when things are behaving properly; we can use them as canaries to detect when MMIO itself has suddenly broken and try to print a more informative error message in the logs. v2: Let the detection function still return 'true' if we've lost our MMIO access. We'll still get an extra false positive message about an unclaimed register access, but we'll still honor the 'mmio_debug' limit and not spam the log. (Lucas) Cc: Lucas De Marchi <lucas.demarchi@xxxxxxxxx> Signed-off-by: Matt Roper <matthew.d.roper@xxxxxxxxx>
Reviewed-by: Lucas De Marchi <lucas.demarchi@xxxxxxxxx> Lucas De Marchi
--- drivers/gpu/drm/i915/intel_uncore.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index 5098f95d71b0..661b50191f2b 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -465,6 +465,22 @@ fpga_check_for_unclaimed_mmio(struct intel_uncore *uncore) if (likely(!(dbg & FPGA_DBG_RM_NOCLAIM))) return false; + /* + * Bugs in PCI programming (or failing hardware) can occasionally cause + * us to lose access to the MMIO BAR. When this happens, register + * reads will come back with 0xFFFFFFFF for every register and things + * go bad very quickly. Let's try to detect that special case and at + * least try to print a more informative message about what has + * happened. + * + * During normal operation the FPGA_DBG register has several unused + * bits that will always read back as 0's so we can use them as canaries + * to recognize when MMIO accesses are just busted. + */ + if (unlikely(dbg == ~0)) + drm_err(&uncore->i915->drm, + "Lost access to MMIO BAR; all registers now read back as 0xFFFFFFFF!\n"); + __raw_uncore_write32(uncore, FPGA_DBG, FPGA_DBG_RM_NOCLAIM); return true; -- 2.25.4
_______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx