Re: [intel-gfx] [PATCH] drm/i915: avoid concurrent writes to aux_inv (rev3)

"Yang, Fei" <fei.yang@xxxxxxxxx> · Thu, 3 Mar 2022 04:00:08 +0000

Hi Chris, for some reason I didn’t receive the review email, so I copied your comments from patchwork and faked this email.

>>  static void execlists_dequeue(struct intel_engine_cs *engine)
>>  {
>>         struct intel_engine_execlists * const execlists = &engine->execlists;
>> @@ -1538,6 +1566,16 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>>                         }
>>  
>>                         if (__i915_request_submit(rq)) {
>> +                               /* hsdes: 1809175790 */
>> +                               if ((GRAPHICS_VER(engine->i915) == 12) &&
>> +                                   rq->vd_ve_aux_inv &&
>> +                                   (engine->class == VIDEO_DECODE_CLASS ||
>> +                                    engine->class == VIDEO_ENHANCEMENT_CLASS)) {

> We do not need the extra checks here; we just do as we are told. We only
> tell ourselves to apply the fixup when required.

Without checking GRAPHICS_VER, I’m seeing a lot of regressions on older platforms in the CI result.
This workaround was only implemented for gen12 (gen12_emit_flush_xcs).
Without checking engine->class, I’m seeing boot issues due to GEM_BUG_ON() in aux_inv_reg().
Any rq will go through this code regardless of engine class and gen version, so the checking seems to be
necessary. 

>> +                                       *rq->vd_ve_aux_inv = i915_mmio_reg_offset

> Likewise, vd_ve is overspecific, aux_inv_fixup or aux_inv_wa (or
> wa_aux_iv, fixup_aux_inv).

I wanted to be specific because the workaround was only implemented for vd/ve engines.
But I’m ok with your proposal.

>> +                                               (aux_inv_reg(engine));
>> +                                       rq->vd_ve_aux_inv = NULL;

> Move this to i915_request initialisation so that we only set aux_inv
> when required, which probably explains the extra defence.

The pointer is currently initialized with 0x5a5a. I set it to NULL in gen12_emit_flush_xcs, otherwise the rq will
enter that if-statement with an invalid pointer.
I’m not familiar with the code, there seems to be multiple functions allocating the structure. I agree it’s better
to set it to NULL at initialization, but need some guidance on where is the best place to do so.

>> +                                       rq->execution_mask = engine->mask;
>> +                               }
>>                                 if (!merge) {
>>                                         *port++ = i915_request_get(last);
>>                                         last = NULL;
>> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
>> index 28b1f9db5487..69de32e5e15d 100644
>> --- a/drivers/gpu/drm/i915/i915_request.h
>> +++ b/drivers/gpu/drm/i915/i915_request.h
>> @@ -350,6 +350,8 @@ struct i915_request {
>>                 struct list_head link;
>>                 unsigned long delay;
>>         } mock;)
>> +
>> +       u32 *vd_ve_aux_inv;

> Not at the end of the struct; that's where we put things in the dungeon.
> The selftest struct should be last; I do hope no one has been putting
> things at random places in the struct without considering the layout and
> semantics. From the flow, this is akin to batch, capture_list; before
> emitted_jiffies would be a good spot.

Got it, will change. I thought adding at the end would be safer, thanks for the explanation.

> -Chris