On Mon, 20 May 2019 12:44:43 +0200, Chris Wilson
<chris@xxxxxxxxxxxxxxxxxx> wrote:
Quoting Michal Wajdeczko (2019-05-20 11:24:37)
On Mon, 20 May 2019 11:35:26 +0200, Chris Wilson
<chris@xxxxxxxxxxxxxxxxxx> wrote:
> Quoting Michal Wajdeczko (2019-05-19 22:50:43)
>> If we never attempted to load HuC firmware, or we already wedged
>> or reset GuC/HuC, then there is no reason to wake up the device
>> to check one bit in the register that will be for sure cleared.
>>
>> v2: check if HuC was enabled; subtle change in ABI
>> reuse hus_is_load helper
>>
>> Suggested-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
>> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@xxxxxxxxx>
>> Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
>> Cc: Tony Ye <tony.ye@xxxxxxxxx>
>> ---
>> drivers/gpu/drm/i915/intel_huc.c | 20 ++++++++++++--------
>> drivers/gpu/drm/i915/intel_huc.h | 5 +++++
>> 2 files changed, 17 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_huc.c
>> b/drivers/gpu/drm/i915/intel_huc.c
>> index 1ff1fb015e58..bfdebec1cfc8 100644
>> --- a/drivers/gpu/drm/i915/intel_huc.c
>> +++ b/drivers/gpu/drm/i915/intel_huc.c
>> @@ -113,7 +113,7 @@ int intel_huc_auth(struct intel_huc *huc)
>> u32 status;
>> int ret;
>>
>> - if (huc->fw.load_status != INTEL_UC_FIRMWARE_SUCCESS)
>> + if (!intel_huc_is_loaded(huc))
>> return -ENOEXEC;
>>
>> ret = intel_guc_auth_huc(guc,
>> @@ -150,21 +150,25 @@ int intel_huc_auth(struct intel_huc *huc)
>> * This function reads status register to verify if HuC
>> * firmware was successfully loaded.
>> *
>> - * Returns: 1 if HuC firmware is loaded and verified,
>> - * 0 if HuC firmware is not loaded and -ENODEV if HuC
>> - * is not present on this platform.
>> + * Returns: 1 if HuC firmware is loaded and verified, 0 if HuC
>> firmware is not
>> + * enabled, -ENOPKG if HuC firmware is not loaded and -ENODEV if
HuC
>> is not
>> + * present on this platform.
>> */
>> int intel_huc_check_status(struct intel_huc *huc)
>> {
>> struct drm_i915_private *dev_priv = huc_to_i915(huc);
>> intel_wakeref_t wakeref;
>> - bool status = false;
>> + bool verified = false;
>>
>> if (!HAS_HUC(dev_priv))
>> return -ENODEV;
>>
>> - with_intel_runtime_pm(dev_priv, wakeref)
>> - status = I915_READ(HUC_STATUS2) & HUC_FW_VERIFIED;
>> + if (!USES_HUC(dev_priv))
>> + return 0;
>
> Hmm, EOPNOTSUPP here for the user disabled case?
I'm not sure that user disabled case shall be reported as error,
since it perfectly fits into legacy ABI 0="not loaded".
The only small change is that now 0="not loaded (not enabled by user)"
The only requirement for the intel-media driver seems to "HUC_STATUS >
0".
That's our only user, so I think we have the liberty to redefine <=0 as
befits error reporting.
>
> Then
>
> if (!intel_huc_is_loaded(huc))
> return -ENOPKG;
>
> bool verified = false;
> with_intel_runtime_pm(dev_priv, wakeref)
> verified = I915_READ(HUC_STATUS2) & HUC_FW_VERIFIED;
> return verified.
>
> That keeps it a bit flatter with the special casing separate and
the 0/1
> result as given by the register status. Then if negative, we know
one of
> the preconditions for using HuC is not available, and if 0
something went
> wrong in mmio setup.
>
> Does that make sense?
IMHO, if something went wrong it is better to report it as error
rather
then weak 0 status. Compare:
I disagree, <0 is the weak case. 0 is the HW reports something went
wrong (and only that). 1 is the HW reports all went well.
But I was assuming that we're defining ABI here, not simply exposing
unified
register value. Since we want 1 to report all ok, and we can fail for
many
reasons (no hw, no fw blob, wrong fw, bad signature, ...) we should at
least
try to match all these failures into error code (today just ENOPKG, but
later
we can extend into ENOEXEC=bad fw, ENOSPC=can't fit into WOPCM, ...).
On other hand, at least for me, user decision to disable HuC should not
be treated the same way as other real failures, so 0 seems like perfect
match.
Then we'll have: 1 = enabled, 0 = disabled, <0 = problem
If we have more ways we can report the HW went wrong, sure expand that
into extra error codes, but I don't see that. And if it's in the regs,
why are not exporting them via reg_read_ioctl?
+---------------+-------+-------+-------+
no hardware -ENODEV -ENODEV -ENODEV
fw disabled 0 0 -ENOTSUP
fw not loaded 0 -ENOPKG -ENOPKG
fw not verified* 0 -ENOPKG 0
fw verified* 1 1 1
+---------------+-------+-------+-------+
*) registry access
Note that in your case, fw verification problem can be reported both
as -ENOPKG or as 0 (former when verification fails at load time,
latter
as result of runtime reg read). Very likely we will never see 0 at
all,
since probably that can't change once fw was verified at load time.
I see more distinction for ENODEV/ENOTSUP/ENOPKG/0/1. I am not getting
your point that we have conflation here, as to me it looks more
Note that we are verifying HuC firmware right after it was loaded into
hardware using exactly the same register as here (see intel_huc_auth),
and if this fails we immediately change fw status to FAIL, so we never
go beyond huc_is_loaded point.
Assuming that once fw is verified it can't fail (to be verified later)
then your codes will be only ENODEV/ENOTSUP/ENOPKG/1 (no 0 value)
distinct, and it is clear when we report the value as given by the
register and when we give an interpreted value for a driver error.
Note that your initial idea was to reuse driver maintained fw status
instead of waking up device just to check register again (we don't do
that now as we need someone to confirm/verify that we can really skip
reg read). Use 0/1 for ABI as register values does not fit here, imho.
There is also a risk that we will report some known failures as <0
error codes, while other, captured by hw as just 0.
That might be viewed as undesired ABI change.
grep says it is fine, so I'm happy that no one else cares (as can be
seen by the lack of igt, no one has looked hard enough into how to
distinguish the API failures ;)
it's always hard to come out from gray area into solid spec ;)
-Chris