Re: [PATCH] drm/i915/execlists: Poison the CSB after use

Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> · Tue, 30 Oct 2018 11:59:18 +0200

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes:

> Quoting Mika Kuoppala (2018-10-30 09:31:56)
>> Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes:
>> 
>> > After reading the event status from the CSB, write back 0 (an invalid
>> > value) so we can detect if the HW should signal a new event without
>> > writing the event in the future.
>> >
>> > References: https://bugs.freedesktop.org/show_bug.cgi?id=108315
>> > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
>> > Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx>
>> > ---
>> >  drivers/gpu/drm/i915/intel_lrc.c | 3 +++
>> >  1 file changed, 3 insertions(+)
>> >
>> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> > index 22b57b8926fc..126efe20d2d6 100644
>> > --- a/drivers/gpu/drm/i915/intel_lrc.c
>> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> > @@ -910,6 +910,9 @@ static void process_csb(struct intel_engine_cs *engine)
>> >                         execlists->active);
>> >  
>> >               status = buf[2 * head];
>> > +             GEM_BUG_ON(!status);
>> 
>> Assuming we still have a timing issue in here, how about
>> we poll a little until status != 0 and then continue with warning?
>
> If there's any race condition here, we definitely do not want to paper
> over it.
>  
>> We could recover by finding the 'bit late' status, instead of
>> oopsing out.
>
> Oopsing out tells us where the problem is very concisely.

It would deliver the same information, so not papering over. Only
benefit is that with this signalling it wont be lost.

>  
>> > +             GEM_DEBUG_EXEC(WRITE_ONCE(*(u32 *)(buf + 2 * head), 0));
>> 
>> What I am afraid here is that we change the timing and cache dynamics
>> for our debug builds so that we bury the pesky thing.
>
> That too is a result.

Agreed, so you want to observe behaviour with and without.

>> Perhaps I am wandering too far but lets consider for the csb loop:
>> 
>> read head,tail;
>> rmb();
>> 
>> for_each_csb() {
>>   64 bit read 
>>   64 bit write to zero it, unconditionally 
>>   act_on_it()
>> }
>> 
>> Too heavy?
>
> Too papery - shouts that we don't know what we or the hw is doing. We
> want to pretend that we know what we are doing at least.

Fair enough. Mainly the amount of reads with and without debugs, changes
inside the csb loop was my concern. But that view should be static to
cpu at this point regardless.

So lets try to find out what exactly how the hardware writes
the csb entries.

This patch does give us more details,
Reviewed-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx