Re: [PATCH] drm/i915/huc: Fix error reported by I915_PARAM_HUC_STATUS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2/12/2020 5:54 AM, Fosha, Robert M wrote:


On 2/11/20 11:57 AM, Michal Wajdeczko wrote:
On Tue, 11 Feb 2020 18:53:05 +0100, Fosha, Robert M <robert.m.fosha@xxxxxxxxx> wrote:



On 2/4/20 4:43 PM, Ye, Tony wrote:


On 1/27/2020 1:41 AM, Michal Wajdeczko wrote:
On Thu, 23 Jan 2020 16:51:58 +0100, Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> wrote:

Quoting Michal Wajdeczko (2020-01-23 15:38:52)
On Thu, 23 Jan 2020 16:02:17 +0100, Chris Wilson
<chris@xxxxxxxxxxxxxxxxxx> wrote:

> Quoting Daniele Ceraolo Spurio (2020-01-22 23:52:33)
>>
>>
>> On 1/22/20 11:48 AM, Michal Wajdeczko wrote:
>> >  From commit 84b1ca2f0e68 ("drm/i915/uc: prefer intel_gt over i915 >> > in GuC/HuC paths") we stopped using HUC_STATUS error -ENODEV only >> > to indicate lack of HuC hardware and we started to use this error
>> > also for all other cases when HuC was not in use or supported.
>> >
>> > Fix that by relying again on HAS_GT_UC macro, since currently
>> > used function intel_huc_is_supported() is based on HuC firmware
>> > support which could be unsupported also due to force disabled
>> > GuC firmware.
>> >
>> > Signed-off-by: Michal Wajdeczko <michal.wajdeczko@xxxxxxxxx>
>> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@xxxxxxxxx>
>> > Cc: Michal Wajdeczko <michal.wajdeczko@xxxxxxxxx>
>> > Cc: Tony Ye <tony.ye@xxxxxxxxx>
>>
>> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@xxxxxxxxx>
>
> Once upon a time did you (Michal) not argue we should indicate the lack
> of firmware in the error code? Something like
>
> if (!HAS_GT_UC(gt->i915))
>       return -ENODEV;
>
> if (!intel_huc_is_supported(huc))
>       return -ENOEXEC;

Yes, we discussed this here [1] together with [2] but we didn't
conclude our discussion due to different opinions on how represent
some states, in particular "manually disabled" state.

In this patch I just wanted to restore old notation.

But we can start new discussion, here is summary:

------------------+----------+----------+----------
  HuC state        | today*   | option A | option B
------------------+----------+----------+----------
no HuC hardware   | -ENODEV  | -ENODEV  | -ENODEV
GuC fw disabled   |   0      |     0    | -EOPNOTSUPP
HuC fw disabled   |   0      |     0    | -EOPNOTSUPP
HuC fw missing    |   0      | -ENOPKG  | -ENOEXEC
HuC fw error      |   0      | -ENOEXEC | -ENOEXEC
HuC fw fail       |   0      | -EACCES  |    0
HuC authenticated |   1      |     1    |    1
------------------+----------+----------+----------

By fw fail, you mean we loaded the firmware (to our knowledge)
correctly, but HUC_STATUS is not reported as valid?

If so, I support option B. I like the idea of saying
"no HuC" (machine too old)
"no firmware" (user action, or lack thereof)
0 (fw unhappy)
1 (fw reports success)

In between states for failures in fw loading? Not so sure. But I can see
the nicety in distinguishing between lack of firmware and some random
failure in loading the firmware (the former being user action required
to rectify, command line parameter whatever and the latter being the
firmware file is either invalid or a stray neutrino prevented loading).

Imo the error messages should be about why we cannot probe/trust the
HUC_STATUS register. If everything is setup correctly then the returned value should be from reading the register. I dislike only returning 1 if
supported, and converting a valid read of 0 into another error.

So Option B :)

But I'm not sure that option B is consistent in error reporting, as
"fw unhappy" is definitely an serious error but is represented as plain non-error "0" status, while "fw disabled" (user action) is treated as error

------------------+----------
   HuC state       | option B
------------------+----------
no HuC hardware   | -ENODEV
GuC fw disabled   | -EOPNOTSUPP -> user decision, why error?
HuC fw disabled   | -EOPNOTSUPP -> user decision, why error?
HuC fw missing    | -ENOEXEC
HuC fw error      | -ENOEXEC
HuC fw fail       |    0        -> unlikely, but still fw/hw error
HuC authenticated |    1
------------------+----------

On other hand, option A treats all error conditions as errors, leaving
status codes only for normal operations: disabled(0)/authenticated(1):

------------------+----------
   HuC state       | option A
------------------+----------
no HuC hardware   | -ENODEV  -> you shouldn't ask
GuC fw disabled   |     0    -> user decision, not an error
HuC fw disabled   |     0    -> user decision, not an error
HuC fw missing    | -ENOPKG  -> fw not installed correctly
HuC fw error      | -ENOEXEC -> bad/wrong fw
HuC fw fail       | -EACCES  -> fw/hw error
HuC authenticated |     1
------------------+----------

Vote for Option A.

Regards,
Tony


Are we ok to move forward on this? Michal, are you working on updating the patch?

I can update the patch, but I don't know which option to implement:
Tony votes for Option A, Chris for Option B, what's your choice?

Michal


I don't have a preference and would trust both Tony and Chris over myself. However, if Tony is using HuC more I would vote for his prefered option.

-Rob

Option B takes enable_guc=0|1 (user doesn't want to load HuC FW) as error, and takes FW fail (failed to xfer or init/auth the fw) as 0. It would confuse the user.

Regards,
Tony



-Rob


But since I'm not an active HuC user, will leave final decision to others.

/Michal



Note that all above should be compatible with media driver,
which explicitly looks for no error and value 1

Cool.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux