Re: [PATCH 1/2] drm/i915/display: remove small micro-optimizations in irq handling

Jani Nikula <jani.nikula@xxxxxxxxx> · Tue, 17 Sep 2024 13:58:19 +0300

On Thu, 18 Apr 2024, Tvrtko Ursulin <tursulin@xxxxxxxxxxx> wrote:
> On 18/04/2024 10:49, Jani Nikula wrote:
>> On Wed, 17 Apr 2024, Lucas De Marchi <lucas.demarchi@xxxxxxxxx> wrote:
>>> On Mon, Apr 08, 2024 at 03:54:44PM GMT, Jani Nikula wrote:
>>>> The raw register reads/writes are there as micro-optimizations to avoid
>>>> multiple pointer indirections on uncore->regs. Presumably this is useful
>>>> when there are plenty of register reads/writes in the same
>>>> function. However, the display irq handling only has a few raw
>>>> reads/writes. Remove them for simplification.
>>>
>>> I think that comment didn't age well. Not to say there's something wrong
>>> with this commit, but just to make sure we are aware of the additional
>>> stuff going on and we if we are ok with that.
>>>
>>> using intel_de_read() in place of raw_reg_read() will do (for newer
>>> platforms):
>>>
>>> 	1) Read FPGA_DBG to detect unclaimed access before the actual read
>>> 	2) Find the relevant forcewake for that register, acquire and wait for ack
>>> 	3) readl(reg)
>>> 	4) Read FPGA_DBG to detect unclaimed access after the actual read
>>> 	5) Trace reg rw
>>>
>>> That's much more than a pointer indirection. Are we ok with that in the
>>> irq?  Also, I don't know why but we have variants to skip tracing (step
>>> 5 above), but on my books a disabled tracepoint is order of magnitudes
>>> less overhead than 1, 2 and 4.
>> 
>> Honestly, I don't really know.
>> 
>> The thing is, we have these ad hoc optimizations all over the place. Why
>> do we have the raw access in two places, but not everywhere in irq
>> handling? The pointer indirection thing really only makes sense if you
>> have a lot of access in a function, but that's not the case. You do have
>> a point about everything else.
>
> The "why only two" places is I think simply an artefact of refactoring 
> and code evolution. Initially all IRQ handling was in one function, then 
> later gen11 and display parts got split out as more platforms were 
> added. For example a3265d851e28 ("drm/i915/irq: Refactor gen11 display 
> interrupt handling").
>
> As for the original rationale, it was described in commits like:
>
> 2e4a5b25886c ("drm/i915: Prune gen8_gt_irq_handler")
> c48a798a7447 ("drm/i915: Trim the ironlake+ irq handler")
>
> Obviosuly, once a portion of a handler was/is extracted, pointer caching 
> to avoid uncore->regs reloads may not make full sense any more due 
> function calls potentially overshadowing that cost.
>
> As for unclaimed debug, I would say it is probably okay to not burden 
> the irq handlers with it, but if the display folks think a little bit of 
> extra cost in this sub-handlers is fine that would sound plausible to me 
> given the frequency of display related interrupts is low. So for me 
> patch is fine if it makes the display decoupling easier.
>
>> What would the interface be like if display were its own module? We
>> couldn't just wrap it all in a bunch of macros and static inlines. Is
>> the end result that display irq handling needs to call functions via
>> pointers in another module? Or do we need to move the register level irq
>> handling to xe and i915 cores, and handle the display parts at a higher
>> abstraction level?
>
> AFAIR no trace variants were not for performance but to avoid log spam 
> when debugging stuff. From things like busy/polling loops.

Bumping a forgotten topic.

Ville, Rodrigo, are we okay with the changes here?

BR,
Jani.

>
> Regards,
>
> Tvrtko
>>>
>>> btw, if we drop the raw accesses, then we can probably drop (1) above.
>>>
>>> Lucas De Marchi
>>>
>>>>
>>>> Signed-off-by: Jani Nikula <jani.nikula@xxxxxxxxx>
>>>> ---
>>>> drivers/gpu/drm/i915/display/intel_display_irq.c | 15 +++++++--------
>>>> 1 file changed, 7 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/display/intel_display_irq.c b/drivers/gpu/drm/i915/display/intel_display_irq.c
>>>> index f846c5b108b5..d4ae9139be39 100644
>>>> --- a/drivers/gpu/drm/i915/display/intel_display_irq.c
>>>> +++ b/drivers/gpu/drm/i915/display/intel_display_irq.c
>>>> @@ -1148,15 +1148,14 @@ void gen8_de_irq_handler(struct drm_i915_private *dev_priv, u32 master_ctl)
>>>>
>>>> u32 gen11_gu_misc_irq_ack(struct drm_i915_private *i915, const u32 master_ctl)
>>>> {
>>>> -	void __iomem * const regs = intel_uncore_regs(&i915->uncore);
>>>> 	u32 iir;
>>>>
>>>> 	if (!(master_ctl & GEN11_GU_MISC_IRQ))
>>>> 		return 0;
>>>>
>>>> -	iir = raw_reg_read(regs, GEN11_GU_MISC_IIR);
>>>> +	iir = intel_de_read(i915, GEN11_GU_MISC_IIR);
>>>> 	if (likely(iir))
>>>> -		raw_reg_write(regs, GEN11_GU_MISC_IIR, iir);
>>>> +		intel_de_write(i915, GEN11_GU_MISC_IIR, iir);
>>>>
>>>> 	return iir;
>>>> }
>>>> @@ -1169,18 +1168,18 @@ void gen11_gu_misc_irq_handler(struct drm_i915_private *i915, const u32 iir)
>>>>
>>>> void gen11_display_irq_handler(struct drm_i915_private *i915)
>>>> {
>>>> -	void __iomem * const regs = intel_uncore_regs(&i915->uncore);
>>>> -	const u32 disp_ctl = raw_reg_read(regs, GEN11_DISPLAY_INT_CTL);
>>>> +	u32 disp_ctl;
>>>>
>>>> 	disable_rpm_wakeref_asserts(&i915->runtime_pm);
>>>> 	/*
>>>> 	 * GEN11_DISPLAY_INT_CTL has same format as GEN8_MASTER_IRQ
>>>> 	 * for the display related bits.
>>>> 	 */
>>>> -	raw_reg_write(regs, GEN11_DISPLAY_INT_CTL, 0x0);
>>>> +	disp_ctl = intel_de_read(i915, GEN11_DISPLAY_INT_CTL);
>>>> +
>>>> +	intel_de_write(i915, GEN11_DISPLAY_INT_CTL, 0);
>>>> 	gen8_de_irq_handler(i915, disp_ctl);
>>>> -	raw_reg_write(regs, GEN11_DISPLAY_INT_CTL,
>>>> -		      GEN11_DISPLAY_IRQ_ENABLE);
>>>> +	intel_de_write(i915, GEN11_DISPLAY_INT_CTL, GEN11_DISPLAY_IRQ_ENABLE);
>>>>
>>>> 	enable_rpm_wakeref_asserts(&i915->runtime_pm);
>>>> }
>>>> -- 
>>>> 2.39.2
>>>>
>> 

-- 
Jani Nikula, Intel