On 6/14/2022 8:30 AM, Ceraolo Spurio, Daniele wrote:
On 6/14/2022 12:44 AM, Tvrtko Ursulin wrote:
On 13/06/2022 19:13, Ceraolo Spurio, Daniele wrote:
On 6/13/2022 10:39 AM, Tvrtko Ursulin wrote:
On 13/06/2022 18:06, Ceraolo Spurio, Daniele wrote:
On 6/13/2022 9:56 AM, Tvrtko Ursulin wrote:
On 13/06/2022 17:41, Ceraolo Spurio, Daniele wrote:
On 6/13/2022 9:31 AM, Tvrtko Ursulin wrote:
On 13/06/2022 16:39, Ceraolo Spurio, Daniele wrote:
On 6/13/2022 1:16 AM, Tvrtko Ursulin wrote:
On 10/06/2022 00:19, Daniele Ceraolo Spurio wrote:
On DG2, HuC loading is performed by the GSC, via a PXP
command. The load
operation itself is relatively simple (just send a message
to the GSC
with the physical address of the HuC in LMEM), but there are
timing
changes that requires special attention. In particular, to
send a PXP
command we need to first export the GSC driver and then wait
for the
mei-gsc and mei-pxp modules to start, which means that HuC
load will
complete after i915 load is complete. This means that there
is a small
window of time after i915 is registered and before HuC is
loaded
during which userspace could submit and/or checking the HuC
load status,
although this is quite unlikely to happen (HuC is usually
loaded before
kernel init/resume completes).
We've consulted with the media team in regards to how to
handle this and
they've asked us to do the following:
1) Report HuC as loaded in the getparam IOCTL even if load
is still in
progress. The media driver uses the IOCTL as a way to check
if HuC is
enabled and then includes a secondary check in the batches
to get the
actual status, so doing it this way allows userspace to keep
working
without changes.
2) Stall all userspace VCS submission until HuC is loaded.
Stalls are
expected to be very rare (if any), due to the fact that HuC
is usually
loaded before kernel init/resume is completed.
Motivation to add these complications into i915 are not clear
to me here. I mean there is no HuC on DG2 _yet_ is the
premise of the series, right? So no backwards compatibility
concerns. In this case why jump through the hoops and not let
userspace handle all of this by just leaving the getparam
return the true status?
The main areas impacted by the fact that we can't guarantee
that HuC load is complete when i915 starts accepting
submissions are boot and suspend/resume, with the latter being
the main problem; GT reset is not a concern because HuC now
survives it. A suspend/resume can be transparent to userspace
and therefore the HuC status can temporarily flip from loaded
to not without userspace knowledge, especially if we start
going into deeper suspend states and start causing HuC resets
when we go into runtime suspend. Note that this is different
from what happens during GT reset for older platforms, because
in that scenario we guarantee that HuC reload is complete
before we restart the submission back-end, so userspace
doesn't notice that the HuC status change. We had an internal
discussion about this problem with both media and i915 archs
and the conclusion was that the best option is for i915 to
stall media submission while HuC (re-)load is in progress.
Resume is potentialy a good reason - I did not pick up on that
from the cover letter. I read the statement about the unlikely
and small window where HuC is not loaded during kernel
init/resume and I guess did not pick up on the resume part.
Waiting for GSC to load HuC from i915 resume is not an option?
GSC is an aux device exported by i915, so AFAIU GSC resume can't
start until i915 resume completes.
I'll dig into this in the next few days since I want to
understand how exactly it works. Or someone can help explain.
If in the end conclusion will be that i915 resume indeed cannot
wait for GSC, then I think auto-blocking of queued up contexts on
media engines indeed sounds unavoidable. Otherwise, as you
explained, user experience post resume wouldn't be good.
Even if we could implement a wait, I'm not sure we should. GSC
resume and HuC reload takes ~300ms in most cases, I don't think we
want to block within the i915 resume path for that long.
Yeah maybe not. But entertaining the idea that it is technically
possible to block - we could perhaps add uapi for userspace to mark
contexts which want HuC access. Then track if there are any such
contexts with outstanding submissions and only wait in resume if
there are. If that would end up significantly less code on the i915
side to maintain is an open.
What would be the end result from users point of view in case where
it suspended during video playback? The proposed solution from this
series sees the video stuck after resume. Maybe compositor blocks
as well since I am not sure how well they handle one window not
providing new data. Probably depends on the compositor.
And then with a simpler solution definitely the whole resume would
be delayed by 300ms.
With my ChromeOS hat the stalled media engines does sound like a
better solution. But with the maintainer hat I'd like all options
evaluated since there is attractiveness if a good enough solution
can be achieved with significantly less kernel code.
You say 300ms is typical time for HuC load. How long it is on other
platforms? If much faster then why is it so slow here?
The GSC itself has to come out of suspend before it can perform the
load, which takes a few tens of ms I believe. AFAIU the GSC is also
slower in processing the HuC load and auth compared to the legacy
path. The GSC FW team gave a 250ms limit for the time the GSC FW
needs from start of the resume flow to HuC load complete, so I
bumped that to ~300ms to account for all other SW interactions, plus
a bit of buffer. Note that a bit of the SW overhead is caused by the
fact that we have 2 mei modules in play here: mei-gsc, which manages
the GSC device itself (including resume), and mei-pxp, which owns
the pxp messaging, including HuC load.
And how long on other platforms (not DG2) do you know? Presumably
there the wait is on the i915 resume path?
I don't have "official" expected load times at hand, but looking at
the BAT boot logs for this series for DG1 I see it takes ~10 ms to
load both GuC and HuC:
<7>[ 8.157838] i915 0000:03:00.0: [drm:intel_huc_init [i915]] GSC
loads huc=no
<6>[ 8.158632] i915 0000:03:00.0: [drm] GuC firmware
i915/dg1_guc_70.1.1.bin version 70.1
<6>[ 8.158634] i915 0000:03:00.0: [drm] HuC firmware
i915/dg1_huc_7.9.3.bin version 7.9
<7>[ 8.164255] i915 0000:03:00.0: [drm:guc_enable_communication
[i915]] GuC communication enabled
<6>[ 8.166111] i915 0000:03:00.0: [drm] HuC authenticated
Note that we increase the GT frequency all the way to the max before
starting the FW load, which speeds things up.
However, do we really need to lie in the getparam? How about
extend or add a new one to separate the loading vs loaded states?
Since userspace does not support DG2 HuC yet this should be doable.
I don't really have a preference here. The media team asked us to
do it this way because they wouldn't have a use for the different
"in progress" and "done" states. If they're ok with having
separate flags that's fine by me.
Tony, any feedback here?
We don't even have any docs in i915_drm.h in terms of what it means:
#define I915_PARAM_HUC_STATUS 42
Seems to be a boolean. Status false vs true? Could you add some docs?
There is documentation above intel_huc_check_status(), which is also
updated in this series. I can move that to i915_drm.h.
That would be great, thanks.
And with so rich return codes already documented and exposed via uapi
- would we really need to add anything new for DG2 apart for
userspace to know that if zero is returned (not a negative error
value) it should retry? I mean is there another negative error
missing which would prevent zero transitioning to one?
I think if the auth fails we currently return 0, because the uc state
in that case would be "TRANSFERRED", i.e. DMA complete but not fully
enabled. I don't have anything against changing the FW state to
"ERROR" in this scenario and leave the 0 to mean "not done yet", but
I'd prefer the media team to comment on their needs for this IOCTL
before committing to anything.
Currently media doesn't differentiate "delayed loading is in progress"
with "HuC is authenticated and running". If the HuC authentication
eventually fails, the user needs to check the debugfs to know the
reason. IMHO, it's not a big problem as this is what we do even when the
IOCTL returns non-zero values. + Carl to comment.
Thanks,
Tony
Daniele
Regards,
Tvrtko
Daniele
Regards,
Tvrtko
Thanks,
Daniele
Will there be runtime suspend happening on the GSC device
behind i915's back, or i915 and GSC will always be able to
transition the states in tandem?
They're always in sync. The GSC is part of the same HW PCI
device as the rest of the GPU, so they change HW state together.
Okay thanks, I wasn't sure if it is the same or separate device.
Regards,
Tvrtko