On Fri, Aug 18, 2023 at 11:32:27AM +0530, Sundaresan, Sujaritha wrote: > > On 8/18/2023 11:30 AM, Gupta, Anshuman wrote: > > > > > -----Original Message----- > > > From: Intel-gfx <intel-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of > > > Sujaritha Sundaresan > > > Sent: Friday, August 18, 2023 8:16 AM > > > To: intel-gfx@xxxxxxxxxxxxxxxxxxxxx > > > Subject: [PATCH] drm/i915: Add intel_pcode_probe > > > > > > Added intel_pcode_probe, promoted wait for lmem init and intel_pcode_init > > > prior to mmio_probe during load, so that GT registers can be accessed only > > > after this, else MCA is observed. > > > > > > Signed-off-by: Sujaritha Sundaresan <sujaritha.sundaresan@xxxxxxxxx> > > Both DG1 and DG2 crashed during i915_pci_probe. > > BAT is failing. > > Thanks, > > Anshuman Gupta. > > Hi Anshuman, > > Yes I'm currently looking into it. > > Thanks, > > Suja > > > > --- > > > drivers/gpu/drm/i915/i915_driver.c | 37 ++++++++++++++++++++++++----- > > > drivers/gpu/drm/i915/intel_uncore.c | 12 ---------- > > > 2 files changed, 31 insertions(+), 18 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/i915/i915_driver.c > > > b/drivers/gpu/drm/i915/i915_driver.c > > > index f8dbee7a5af7..92cafceaf447 100644 > > > --- a/drivers/gpu/drm/i915/i915_driver.c > > > +++ b/drivers/gpu/drm/i915/i915_driver.c > > > @@ -93,6 +93,7 @@ > > > #include "i915_memcpy.h" > > > #include "i915_perf.h" > > > #include "i915_query.h" > > > +#include "i915_reg.h" > > > #include "i915_suspend.h" > > > #include "i915_switcheroo.h" > > > #include "i915_sysfs.h" > > > @@ -436,6 +437,32 @@ static int i915_pcode_init(struct drm_i915_private > > > *i915) > > > return 0; > > > } > > > > > > +static int intel_pcode_probe(struct drm_i915_private *i915) { > > > + struct intel_uncore *uncore; > > > + int ret; > > > + > > > + /* > > > + * The boot firmware initializes local memory and assesses its health. > > > + * If memory training fails, the punit will have been instructed to > > > + * keep the GT powered down; we won't be able to communicate > > > with it > > > + * and we should not continue with driver initialization. > > > + */ > > > + if (IS_DGFX(i915) && > > > + !(__raw_uncore_read32(uncore, GU_CNTL) & LMEM_INIT)) > > > { > > > + drm_err(&i915->drm, "LMEM not initialized by firmware\n"); > > > + return -ENODEV; > > > + } > > > + > > > + /* > > > + * Driver handshakes with pcode via mailbox command to know that > > > SoC > > > + * initialization is complete before proceeding further > > > + */ > > > + ret = i915_pcode_init(i915); > > > + > > > + return ret; > > > +} > > > + > > > /** > > > * i915_driver_hw_probe - setup state requiring device access > > > * @dev_priv: device private > > > @@ -547,10 +574,6 @@ static int i915_driver_hw_probe(struct > > > drm_i915_private *dev_priv) > > > > > > intel_opregion_setup(dev_priv); > > > > > > - ret = i915_pcode_init(dev_priv); > > > - if (ret) > > > - goto err_opregion; > > > - > > > /* > > > * Fill the dram structure to get the system dram info. This will be > > > * used for memory latency calculation. > > > @@ -561,8 +584,6 @@ static int i915_driver_hw_probe(struct > > > drm_i915_private *dev_priv) > > > > > > return 0; > > > > > > -err_opregion: > > > - intel_opregion_cleanup(dev_priv); > > > err_msi: > > > if (pdev->msi_enabled) > > > pci_disable_msi(pdev); > > > @@ -778,6 +799,10 @@ int i915_driver_probe(struct pci_dev *pdev, const > > > struct pci_device_id *ent) > > > if (ret < 0) > > > goto out_runtime_pm_put; > > > > > > + ret = intel_pcode_probe(i915); > > > + if (ret) > > > + goto out_tiles_cleanup; > > > + > > > ret = i915_driver_mmio_probe(i915); chicken-egg problem here?! I don't believe this could ever work. You need the MMIO space to be able to communicate with PCODE mailbox and check the lmem init, no?! I believe the bug is that PCODE check should come before the LMEM_INIT check. LMEM won't be ready before pcode state that everything was ready for the lmem access. And on your code pcode ready check is still after the lmem. Cc: Aravind Iddamsetty <aravind.iddamsetty@xxxxxxxxx> who was recently raising that we had an order problem there. > > > if (ret < 0) > > > goto out_tiles_cleanup; > > > diff --git a/drivers/gpu/drm/i915/intel_uncore.c > > > b/drivers/gpu/drm/i915/intel_uncore.c > > > index dfefad5a5fec..4a353d4adf86 100644 > > > --- a/drivers/gpu/drm/i915/intel_uncore.c > > > +++ b/drivers/gpu/drm/i915/intel_uncore.c > > > @@ -2658,18 +2658,6 @@ int intel_uncore_init_mmio(struct intel_uncore > > > *uncore) > > > if (ret) > > > return ret; > > > > > > - /* > > > - * The boot firmware initializes local memory and assesses its health. > > > - * If memory training fails, the punit will have been instructed to > > > - * keep the GT powered down; we won't be able to communicate > > > with it > > > - * and we should not continue with driver initialization. > > > - */ > > > - if (IS_DGFX(i915) && > > > - !(__raw_uncore_read32(uncore, GU_CNTL) & LMEM_INIT)) { > > > - drm_err(&i915->drm, "LMEM not initialized by firmware\n"); > > > - return -ENODEV; > > > - } > > > - > > > if (GRAPHICS_VER(i915) > 5 && !intel_vgpu_active(i915)) > > > uncore->flags |= UNCORE_HAS_FORCEWAKE; > > > > > > -- > > > 2.41.0