Re: [RFC] tentative fix for drm/i915/gt regression on preempt-rt

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Mon, 3 Jul 2023 16:30:01 +0100

Hi,

On 30/06/2023 14:09, Sebastian Andrzej Siewior wrote:
On 2023-06-22 20:57:50 [-0400], Paul Gortmaker wrote:
[ longer report about what is broken.]

Commit ade8a0f598443 ("drm/i915: Make all GPU resets atomic") introduces
a preempt_disable() section around the invocation of the reset callback.
I can't find an explanation why this is needed. There was a comment
saying
| * We want to perform per-engine reset from atomic context (e.g.
| * softirq), which imposes the constraint that we cannot sleep.

but it does not state the problem with being preempted while waiting for
the reset. The commit itself does not explain why an atomic reset is
needed, it just states that it is a requirement now. On !RT we could
pull packets from a NICs and forward them other NICs for 2ms.

I've been looking over the reset callbacks and gen8_reset_engines() +
gen6_reset_engines() acquire a spinlock_t. Since this becomes a sleeping
lock on PREEMPT_RT it must not be acquired with disabled preemption.
i915_do_reset() acquires no lock but then has two udelay()s of 50us
each. Not good latency wise in a preempt-off section.

Could someone please explain why atomic is needed during reset, what
problems are introduced by a possible preemption?

Atomic requirement from that commit text is likely referring to removing 
the old big sleeping mutex we had in the reset path. So it looks 
plausible that preempt_disable() section is not strictly needed and 
perhaps motivation simply was, given those 20-50us polls on hw registers 
involved, to make them happen as fast as possible and so minimize visual 
glitching during resets.

Although that reasoning would only apply on some hw generations, where 
the irqsave spinlock is not held across the whole sequence anyway.

And I suspect those same platforms would be the annoying ones, if one 
simply wanted to try without the preempt_disable section, given our 
wait_for_atomic macro will complain loudly if not used from an atomic 
context.

But I think we do have a macro for short register waits which works with 
preempting enabled. I will try and cook up a patch and submit to our CI 
during the week, then see what happens.

Or even moving the preempt_disable down so it just encompasses the 
register write + wait. That would then be under the spinlock which is 
presumable okay on RT? (Yes I know it wouldn't' solve one half of your 
"complaint" but lets just entertain the idea for now.)

Regards,

Tvrtko