On 4/26/2023 9:48 AM, Teres Alexis, Alan Previn wrote:
On Wed, 2023-04-26 at 13:52 +0200, Daniel Vetter wrote:
On Tue, Apr 25, 2023 at 04:41:54PM +0300, Joonas Lahtinen wrote:
(+ Faith and Daniel as they have been involved in previous discussions)
Quoting Jordan Justen (2023-04-24 20:13:00)
On 2023-04-24 02:08:43, Tvrtko Ursulin wrote:
alan:snip
- the more a feature spans drivers/modules, the more it should be
discovered by trying it out, e.g. dma-buf fence import/export was a huge
discussion, luckily mesa devs figured out how to transparantly fall back
at runtime so we didn't end up merging the separate feature flag (I
think at least, can't find it). pxp being split across i915/me/fw/who
knows what else is kinda similar so I'd heavily lean towards discovery
by creating a context
- pxp taking 8s to init a ctx sounds very broken, irrespective of anything
else
I think there has been a bit of confusion in regards to this timeout and
to where it applies, so let me try to clarify to make sure we're all on
the same page (Alan has already explained most of it below, but I'm
going to go in a bit more detail and I want to make sure it's all in one
place for reference).
Before we can do any PXP operation, dependencies need to be satisfied,
some of which are outside of i915. For MTL, these are:
GSC FW needs to be loaded (~250 ms)
HuC FW needs to be authenticated for PXP ops (~20 ms)
MEI modules need to be bound (depends on the probe ordering, but usually
a few secs)
GSC SW proxy via MEI needs to be established (~500 ms normally, but can
take a few seconds on the first boot after a firmware update)
Due to the fact that these can take several seconds in total to
complete, to avoid stalling driver load/resume for that long we moved
the i915-side operations to a separate worker and we register i915
before they've completed. This means that we can get a PXP context
creation call before all the dependencies are in place, in which case we
do need to wait and that's where the 8s come from. After all the pieces
are in place, a PXP context creation call is much faster (up to ~150 ms,
which is the time required to start the PXP session if it is not already
running).
The reason why we suggested a dedicated getparam was to avoid requiring
early users to wait for all of that to happen just to check the
capability. By the time an user actually wants to use PXP, we're likely
done with the prep steps (or at least we're far along with them) and
therefore the wait will be short.
Alan: Please be aware that:
1. the wait-timeout was changed to 1 second sometime back.
2. the I'm not deciding the time-out. I initially wanted to keep it at the same
timeout as ADL (250 milisec) - and ask the UMD to retry if user needs it. (as per
same ADL behavior). Daniele requested to move it to 8 seconds - but thru review
process, we reduced it to 1 second.
3. In anycase, thats just the wait-timeout - and we know it wont succeed until
~6 seconds after i915 (~9 secs after boot). The issue isnt our hardware or i915
- its the component driver load <-- this is what's broken.
I think the question here is whether the mei driver is taking a long
time to probe or if it is just being probed late. In the latter case, I
wouldn't call it broken.
Details: PXP context is dependent on gsc-fw load, huc-firmware load, mei-gsc-proxy
component driver load + bind, huc-authentication and gsc-proxy-init-handshake.
Most of above steps begin rather quickly during i915 driver load - the delay
seems to come from a very late mei-gsc-proxy component driver load. In fact the
parent mei-me driver is only getting ~6 seconds after i915 init is done. That
blocks the gsc-proxy-init-handshake and huc-authentication and lastly PXP.
That said, what is broken is why it takes so long to get the component drivers
to come up. NOTE: PXP isnt really doing anything differently in the context
creation flow (in terms of time-consuming-steps compared to ADL) besides the
extra dependency waits these.
We can actually go back to the original timeout of 250 milisecs like we have in ADL
but will fail if MESA calls in too early (but will succeed later) ... or...
we can create the GET_PARAMs.
A better idea would be to figure out how to control the driver load order and
force mei driver + components to get called right after i915. I was informed
there is no way to control this and changes here will likely not be accepted
upstream.
we could add a device link to mark i915 as a consumer of mei, but I
believe that wouldn't work for 2 reasons
1 - on discrete, mei binds to a child device of i915, so the dependency
is reversed
2 - the link might just delay the i915 load to after the mei load, which
I'm not sure it is something we want (and at that point we could also
just wait for mei to bind from within the i915 load).
Daniele
++ Daniele - can you chime in?
Take note that ADL has the same issue but for whatever reason, the dependant
mei component on ADL loaded much sooner - so it was never an issue that was
caught but still existed on ADL time merge (if users customize the kernel +
compositor for fastboot it will happen).(i realize I havent tested ADL with the
new kernel configs that we use to also boot PXP on MTL - wonder if the new
mei configs are causing the delay - i.e. ADL customer could suddenly see this
6 sec delay too. - something i have to check now)