On Tue, 2023-02-14 at 13:38 -0800, Teres Alexis, Alan Previn wrote: > Add MTL's function for ARB session creation using PXP firmware > version 4.3 ABI structure format. > > Also add MTL's function for ARB session invalidation but this > reuses PXP firmware version 4.2 ABI structure format. > > Before checking the return status, look at the GSC-CS-Mem-Header's > pending-bit which means the GSC firmware is busy and we should > resubmit. > > Signed-off-by: Alan Previn <alan.previn.teres.alexis@xxxxxxxxx> > --- alan:snip Not part of this patch today but a new modification is required that would end up going into this patch ---> So from the internal testing we are doing on MTL, i have noticed that the first time the GSC firmware is requested to init the arb session (right after a cold-boot or driver-reload-after-flr), it takes much longer. This has resulted in the observation of the following problematic event flow: 1. app or igt calls gem-context-create to create a protected context (after a fresh boot or driver reload). 2. intel_pxp_start will begin the global teardown and recreation where: 2-a: the first part (i.e. session teardown) is skipped (since arb session wasnt created before this) 2-b: the second part (i.e. arb session init commands via the gsc firmware) does happen and takes a long time (on first time) 3. step 2 is queued thru a worker while the main call into intel_pxp_start continues to wait for the arb session to start and finally bails out with a timeout (back up through gem-context-create). 4. app retries again and now we get a second call that repeats step 1 while 2-b is still wrapping up. so depending on the race of this step 4 (step-1-recall) vs the completion of step 2-b, we could end up getting a 2nd teardown right (i.e. step 2-a going in) after the the first arb-session-creation completed ... eventhough in both cases app just wants the creation. The simplest fix (with minimal code changes) would be to add a complementary "is_arb_creation_pending" flag alongside the is_arb_valid flag - with both remainining protected by the arb-mutex. That said, we I'll respin rev6 with this fix along with other mutex fix on Patch4.