Re: [PATCH v9 6/7] drm/i915/guc: Check the locking status of GuC WOPCM registers

Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> · Fri, 09 Feb 2018 12:47:15 +0200

Quoting Jackie Li (2018-02-09 01:03:54)
> GuC WOPCM registers are write-once registers. Current driver code accesses
> these registers without checking the accessibility to these registers which
> will lead to unpredictable driver behaviors if these registers were touch
> by other components (such as faulty BIOS code).
> 
> This patch moves the GuC WOPCM registers updating code into
> intel_guc_wopcm.c and adds check before and after the update to GuC WOPCM
> registers so that we can make sure the driver is in a known state before
> and after writing to these write-once registers.
> 
> v6:
>  - Made sure module reloading won't bug the kernel while doing
>    locking status checking
> 
> v7:
>  - Fixed patch format issues
> 
> v8:
>  - Fixed coding style issue on register lock bit macro definition (Sagar)
> 
> v9:
>  - Avoided to use redundant !! to cast uint to bool (Chris)
>  - Return error code instead of GEM_BUG_ON for locked with invalid register
>    values case (Sagar)
>  - Updated guc_wopcm_hw_init to use guc_wopcm as first parameter (Michal)
>  - Added code to set and validate the HuC_LOADING_AGENT_GUC bit in GuC
>    WOPCM offset register based on the presence of HuC firmware (Michal)
>  - Use bit fields instead of macros for GuC WOPCM flags (Michal)
> 
> Cc: Michal Wajdeczko <michal.wajdeczko@xxxxxxxxx>
> Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx>
> Signed-off-by: Jackie Li <yaodong.li@xxxxxxxxx>

<SNIP>

> +++ b/drivers/gpu/drm/i915/intel_guc_wopcm.c
> +static inline bool guc_wopcm_locked(struct intel_guc *guc)
> +{
> +       struct drm_i915_private *i915 = guc_to_i915(guc);
> +       bool size_reg_locked = __reg_locked(i915, GUC_WOPCM_SIZE);
> +       bool offset_reg_locked = __reg_locked(i915, DMA_GUC_WOPCM_OFFSET);
> +
> +       return size_reg_locked && offset_reg_locked;

Hmm, isn't this supposed to be || instead, if either of the registers is
locked, we can't really do much?

> +static inline void guc_wopcm_hw_update(struct intel_guc *guc)
> +{
> +       struct drm_i915_private *dev_priv = guc_to_i915(guc);
> +       u32 offset_reg_val;
> +
> +       /* GuC WOPCM registers should be unlocked at this point. */
> +       GEM_BUG_ON(__reg_locked(dev_priv, GUC_WOPCM_SIZE));
> +       GEM_BUG_ON(__reg_locked(dev_priv, DMA_GUC_WOPCM_OFFSET));
> +
> +       offset_reg_val = guc->wopcm.offset;
> +       if (guc->wopcm.need_load_huc_fw)

"load_huc_fw" would read better here.

> +               offset_reg_val |= HUC_LOADING_AGENT_GUC;
> +
> +       I915_WRITE(GUC_WOPCM_SIZE, guc->wopcm.size);
> +       I915_WRITE(DMA_GUC_WOPCM_OFFSET, offset_reg_val);
> +
> +       GEM_BUG_ON(!__reg_locked(dev_priv, GUC_WOPCM_SIZE));
> +       GEM_BUG_ON(!__reg_locked(dev_priv, DMA_GUC_WOPCM_OFFSET));

This could use static inline write_lockable_reg() helper with a return value.
I think we need to propagate the error all the way up the chain, as it's
not a driver programmer error if these registers are locked, instead it is
likely to be a BIOS problem.

In the helper we could also test if the value is locked but happens to
match what we were intending to write, then we could call it success and
maybe emit an INFO message.

> +}
> +
> +static inline bool guc_wopcm_regs_valid(struct intel_guc *guc)
> +{
> +       struct drm_i915_private *dev_priv = guc_to_i915(guc);
> +       u32 size, offset;
> +       bool guc_loads_huc;
> +       u32 reg_val;
> +
> +       reg_val = I915_READ(GUC_WOPCM_SIZE);
> +       /* GuC WOPCM size should be always multiple of 4K pages. */
> +       size = reg_val & PAGE_MASK;
> +
> +       reg_val = I915_READ(DMA_GUC_WOPCM_OFFSET);
> +       guc_loads_huc = reg_val & HUC_LOADING_AGENT_GUC;
> +       offset = reg_val & GUC_WOPCM_OFFSET_MASK;
> +
> +       if (guc->wopcm.need_load_huc_fw && !guc_loads_huc)
> +               return false;
> +
> +       return (size == guc->wopcm.size) && (offset == guc->wopcm.offset);
> +}

This seems to have much of what I wrote above already. I'd still split
the actual writes in a helper that will report the per-register a
WARN about "mismatch between locked register value and computed value"
or so. Dumping both the computed an actual register value.

> @@ -147,6 +211,8 @@ int intel_guc_wopcm_init(struct intel_guc_wopcm *guc_wopcm, u32 guc_fw_size,
>         guc->wopcm.offset = offset;
>         guc->wopcm.size = size;
>         guc->wopcm.top = top;
> +       /* Use GuC to load HuC firmware if HuC firmware is present. */

Comment is again more on the "what" side, so can be dropped.

> +       guc->wopcm.need_load_huc_fw = huc_fw_size ? 1 : 0;

	... = huc_fw_size > 0;

Will literally read as "set guc load_huc_fw if huc_fw_size is
greater than zero."

> +int intel_guc_wopcm_init_hw(struct intel_guc_wopcm *guc_wopcm)
> +{
> +       struct intel_guc *guc = guc_wopcm_to_guc(guc_wopcm);
> +       bool locked = guc_wopcm_locked(guc);
> +
> +       GEM_BUG_ON(!guc->wopcm.valid);
> +
> +       /*
> +        * Bug if driver hasn't updated the HW Registers and GuC WOPCM has been
> +        * locked. Return directly if WOPCM was locked and we have updated
> +        * the registers.
> +        */
> +       if (locked) {
> +               if (guc->wopcm.hw_updated)
> +                       return 0;
> +
> +               /*
> +                * Mark as updated if registers contained correct values.
> +                * This will happen while reloading the driver module without
> +                * rebooting the system.
> +                */
> +               if (guc_wopcm_regs_valid(guc))
> +                       goto out;
> +
> +               /* Register locked without valid values. Abort HW init. */
> +               return -EINVAL;
> +       }
> +
> +       /* Always update registers when GuC WOPCM is not locked. */
> +       guc_wopcm_hw_update(guc);

This flow will become more clear with the helpers, and we will get a
splat in the kernel message with less code about which register actually
was locked and how the value mismatched.

I'm up for discussion, but to my eyes the code flow would become more clear if
we start with the assumption "we can write the register values to what we have
calculated", and then make it an error that is propagated back up if we can't.

Regards, Joonas
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx