Re: [PATCH 1/2] drm/i915/display: remove small micro-optimizations in irq handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 17, 2024 at 01:58:19PM +0300, Jani Nikula wrote:
> On Thu, 18 Apr 2024, Tvrtko Ursulin <tursulin@xxxxxxxxxxx> wrote:
> > On 18/04/2024 10:49, Jani Nikula wrote:
> >> On Wed, 17 Apr 2024, Lucas De Marchi <lucas.demarchi@xxxxxxxxx> wrote:
> >>> On Mon, Apr 08, 2024 at 03:54:44PM GMT, Jani Nikula wrote:
> >>>> The raw register reads/writes are there as micro-optimizations to avoid
> >>>> multiple pointer indirections on uncore->regs. Presumably this is useful
> >>>> when there are plenty of register reads/writes in the same
> >>>> function. However, the display irq handling only has a few raw
> >>>> reads/writes. Remove them for simplification.
> >>>
> >>> I think that comment didn't age well. Not to say there's something wrong
> >>> with this commit, but just to make sure we are aware of the additional
> >>> stuff going on and we if we are ok with that.
> >>>
> >>> using intel_de_read() in place of raw_reg_read() will do (for newer
> >>> platforms):
> >>>
> >>> 	1) Read FPGA_DBG to detect unclaimed access before the actual read
> >>> 	2) Find the relevant forcewake for that register, acquire and wait for ack
> >>> 	3) readl(reg)
> >>> 	4) Read FPGA_DBG to detect unclaimed access after the actual read
> >>> 	5) Trace reg rw
> >>>
> >>> That's much more than a pointer indirection. Are we ok with that in the
> >>> irq?  Also, I don't know why but we have variants to skip tracing (step
> >>> 5 above), but on my books a disabled tracepoint is order of magnitudes
> >>> less overhead than 1, 2 and 4.
> >> 
> >> Honestly, I don't really know.
> >> 
> >> The thing is, we have these ad hoc optimizations all over the place. Why
> >> do we have the raw access in two places, but not everywhere in irq
> >> handling? The pointer indirection thing really only makes sense if you
> >> have a lot of access in a function, but that's not the case. You do have
> >> a point about everything else.
> >
> > The "why only two" places is I think simply an artefact of refactoring 
> > and code evolution. Initially all IRQ handling was in one function, then 
> > later gen11 and display parts got split out as more platforms were 
> > added. For example a3265d851e28 ("drm/i915/irq: Refactor gen11 display 
> > interrupt handling").
> >
> > As for the original rationale, it was described in commits like:
> >
> > 2e4a5b25886c ("drm/i915: Prune gen8_gt_irq_handler")

Looking at this one it looks that the raw usage came in place to fix
a macro issue, that we don't have anymore anyway.

> > c48a798a7447 ("drm/i915: Trim the ironlake+ irq handler")

Then, looking at this one, it sounds a good optimization.

> >
> > Obviosuly, once a portion of a handler was/is extracted, pointer caching 
> > to avoid uncore->regs reloads may not make full sense any more due 
> > function calls potentially overshadowing that cost.
> >
> > As for unclaimed debug, I would say it is probably okay to not burden 
> > the irq handlers with it, but if the display folks think a little bit of 
> > extra cost in this sub-handlers is fine that would sound plausible to me 
> > given the frequency of display related interrupts is low. 

Well, looking at the optimization above I always had the initial thought
on the low frequency of display interrupts, because I thought about hotplugs.
But perhaps an optimization in vblank ones would be desireable?

> > So for me 
> > patch is fine if it makes the display decoupling easier.
> >
> >> What would the interface be like if display were its own module? We
> >> couldn't just wrap it all in a bunch of macros and static inlines. Is
> >> the end result that display irq handling needs to call functions via
> >> pointers in another module? Or do we need to move the register level irq
> >> handling to xe and i915 cores, and handle the display parts at a higher
> >> abstraction level?
> >
> > AFAIR no trace variants were not for performance but to avoid log spam 
> > when debugging stuff. From things like busy/polling loops.
> 
> Bumping a forgotten topic.
> 
> Ville, Rodrigo, are we okay with the changes here?

I am in favor of this patch. Let's unify things. But perhaps study if
we need as a follow-up some optimization in vblank or any other display
irq and get that done inside intel_de_ mmio helpers?!

> 
> BR,
> Jani.
> 
> >
> > Regards,
> >
> > Tvrtko
> >>>
> >>> btw, if we drop the raw accesses, then we can probably drop (1) above.
> >>>
> >>> Lucas De Marchi
> >>>
> >>>>
> >>>> Signed-off-by: Jani Nikula <jani.nikula@xxxxxxxxx>
> >>>> ---
> >>>> drivers/gpu/drm/i915/display/intel_display_irq.c | 15 +++++++--------
> >>>> 1 file changed, 7 insertions(+), 8 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/i915/display/intel_display_irq.c b/drivers/gpu/drm/i915/display/intel_display_irq.c
> >>>> index f846c5b108b5..d4ae9139be39 100644
> >>>> --- a/drivers/gpu/drm/i915/display/intel_display_irq.c
> >>>> +++ b/drivers/gpu/drm/i915/display/intel_display_irq.c
> >>>> @@ -1148,15 +1148,14 @@ void gen8_de_irq_handler(struct drm_i915_private *dev_priv, u32 master_ctl)
> >>>>
> >>>> u32 gen11_gu_misc_irq_ack(struct drm_i915_private *i915, const u32 master_ctl)
> >>>> {
> >>>> -	void __iomem * const regs = intel_uncore_regs(&i915->uncore);
> >>>> 	u32 iir;
> >>>>
> >>>> 	if (!(master_ctl & GEN11_GU_MISC_IRQ))
> >>>> 		return 0;
> >>>>
> >>>> -	iir = raw_reg_read(regs, GEN11_GU_MISC_IIR);
> >>>> +	iir = intel_de_read(i915, GEN11_GU_MISC_IIR);
> >>>> 	if (likely(iir))
> >>>> -		raw_reg_write(regs, GEN11_GU_MISC_IIR, iir);
> >>>> +		intel_de_write(i915, GEN11_GU_MISC_IIR, iir);
> >>>>
> >>>> 	return iir;
> >>>> }
> >>>> @@ -1169,18 +1168,18 @@ void gen11_gu_misc_irq_handler(struct drm_i915_private *i915, const u32 iir)
> >>>>
> >>>> void gen11_display_irq_handler(struct drm_i915_private *i915)
> >>>> {
> >>>> -	void __iomem * const regs = intel_uncore_regs(&i915->uncore);
> >>>> -	const u32 disp_ctl = raw_reg_read(regs, GEN11_DISPLAY_INT_CTL);
> >>>> +	u32 disp_ctl;
> >>>>
> >>>> 	disable_rpm_wakeref_asserts(&i915->runtime_pm);
> >>>> 	/*
> >>>> 	 * GEN11_DISPLAY_INT_CTL has same format as GEN8_MASTER_IRQ
> >>>> 	 * for the display related bits.
> >>>> 	 */
> >>>> -	raw_reg_write(regs, GEN11_DISPLAY_INT_CTL, 0x0);
> >>>> +	disp_ctl = intel_de_read(i915, GEN11_DISPLAY_INT_CTL);
> >>>> +
> >>>> +	intel_de_write(i915, GEN11_DISPLAY_INT_CTL, 0);
> >>>> 	gen8_de_irq_handler(i915, disp_ctl);
> >>>> -	raw_reg_write(regs, GEN11_DISPLAY_INT_CTL,
> >>>> -		      GEN11_DISPLAY_IRQ_ENABLE);
> >>>> +	intel_de_write(i915, GEN11_DISPLAY_INT_CTL, GEN11_DISPLAY_IRQ_ENABLE);
> >>>>
> >>>> 	enable_rpm_wakeref_asserts(&i915->runtime_pm);
> >>>> }
> >>>> -- 
> >>>> 2.39.2
> >>>>
> >> 
> 
> -- 
> Jani Nikula, Intel



[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux