Re: [PATCH v14 RESEND 5/6] drm/imx: Introduce i.MX8qm/qxp DPU DRM

Maxime Ripard <mripard@xxxxxxxxxx> · Tue, 3 Oct 2023 13:52:54 +0200

On Tue, Sep 26, 2023 at 03:55:35AM +0000, Ying Liu wrote:
> > > > > +	cf->pec_base = devm_ioremap(dpu->dev, pec_base, SZ_16);
> > > > > +	if (!cf->pec_base)
> > > > > +		return -ENOMEM;
> > > > > +
> > > > > +	cf->base = devm_ioremap(dpu->dev, base, SZ_32);
> > > > > +	if (!cf->base)
> > > > > +		return -ENOMEM;
> > > >
> > > > For the same reason, you need to protect any access to a device managed
> > > > resource (so clocks, registers, regulators, etc.) by a call to
> > > > drm_dev_enter/drm_dev_exit and you need to call drm_dev_unplug
> > instead
> > > > of drm_dev_unregister.
> > >
> > > That's a good point. I've tried to do that, but it turns out that the
> > > display controller cannot be enabled again after binding the dpu-core
> > > driver manually again. It seems that the display controller requires a
> > > proper disablement procedure, but the "driver instance overview " kdoc
> > > mentions the shortcoming of no proper disablement if drm_dev_unplug()
> > > is used:
> > >
> > > """
> > > * Drivers that want to support device unplugging (USB, DT overlay unload)
> > should
> > >  * use drm_dev_unplug() instead of drm_dev_unregister(). The driver must
> > protect
> > >  * regions that is accessing device resources to prevent use after they're
> > >  * released. This is done using drm_dev_enter() and drm_dev_exit(). There
> > is one
> > >  * shortcoming however, drm_dev_unplug() marks the drm_device as
> > unplugged before
> > >  * drm_atomic_helper_shutdown() is called. This means that if the disable
> > code
> > >  * paths are protected, they will not run on regular driver module unload,
> > >  * possibly leaving the hardware enabled.
> > > """
> > >
> > > A DPU reset in dpu_core() might be helpful, but I'm not sure if there is any
> > > reset line provided by the embodying system.
> > 
> > Generally speaking, you shouldn't rely on the device being in any
> > particuliar state before your driver loads. So a reset at probe/bind
> > time is a good idea.
> 
> Yes. I'll drop the platform device creations for CRTCs from dpu-core.c 
> and drop the aggregation of CRTC components from different DPU
> instances into one DRM device.  This way, there will be only two CRTCs
> of one DPU in one DRM device.

Ok.

> Then, the driver will be simpler and users cannot unbind the driver of
> one of the two DPU instances,

Uh? They would still be able to do that.

> which means drm_dev_unplug() won't be needed any more(?)

So this would still be needed

> and the reset issue will be gone. The controller will be shutdown
> properly through drm_atomic_helper_shutdown() when the driver module
> is removed.

Again, you shouldn't rely on a particular state at boot. For all you
know, you could have been reset by some watchdog or been kexec'd.

> > > Even if the reset works, the 2nd DPU instance in i.MX8qm would be a
> > > problem, because it won't be reset or properly disabled if the 1st DPU
> > > instance is unbound.
> > 
> > Why it wouldn't be reset?
> 
> Because dpu_core_remove() is not called for the 2nd DPU instance.
> Anyway, with the above new design, this doesn't seem to be a problem
> any more.

Ok.

> > 
> > > Although the two DPU instances could be wrapped by two DRM devices, I
> > > tend not to do that because downstream bridges in future SoCs might be
> > > able to mux to different DPU instances at runtime.
> > >
> > > Due to the disablement issue, can we set drm_dev_enter/exit/unplug
> > > aside first?
> > 
> > I'd rather have that figured out prior to merging.
> 
> I'm assuming that drm_dev_enter/exit/unplug won't be needed with the above
> new design - one DPU instance wrapped by one DRM device.

I'm not sure why you are making that claim. And again, that's good
practice: it does no harm while preventing unsafe behaviour in the
future.

> > > > > +static void dpu_atomic_put_plane_state(struct drm_atomic_state
> > *state,
> > > > > +				       struct drm_plane *plane)
> > > > > +{
> > > > > +	int index = drm_plane_index(plane);
> > > > > +
> > > > > +	plane->funcs->atomic_destroy_state(plane, state-
> > >planes[index].state);
> > > > > +	state->planes[index].ptr = NULL;
> > > > > +	state->planes[index].state = NULL;
> > > > > +	state->planes[index].old_state = NULL;
> > > > > +	state->planes[index].new_state = NULL;
> > > > > +
> > > > > +	drm_modeset_unlock(&plane->mutex);
> > > > > +
> > > > > +	dpu_plane_dbg(plane, "put state\n");
> > > > > +}
> > > > > +
> > > > > +static void dpu_atomic_put_crtc_state(struct drm_atomic_state *state,
> > > > > +				      struct drm_crtc *crtc)
> > > > > +{
> > > > > +	int index = drm_crtc_index(crtc);
> > > > > +
> > > > > +	crtc->funcs->atomic_destroy_state(crtc, state->crtcs[index].state);
> > > > > +	state->crtcs[index].ptr = NULL;
> > > > > +	state->crtcs[index].state = NULL;
> > > > > +	state->crtcs[index].old_state = NULL;
> > > > > +	state->crtcs[index].new_state = NULL;
> > > > > +
> > > > > +	drm_modeset_unlock(&crtc->mutex);
> > > > > +
> > > > > +	dpu_crtc_dbg(crtc, "put state\n");
> > > > > +}
> > > > > +
> > > > > +static void
> > > > > +dpu_atomic_put_possible_states_per_crtc(struct drm_crtc_state
> > > > *crtc_state)
> > > > > +{
> > > > > +	struct drm_atomic_state *state = crtc_state->state;
> > > > > +	struct drm_crtc *crtc = crtc_state->crtc;
> > > > > +	struct drm_plane *plane;
> > > > > +	struct drm_plane_state *old_plane_state, *new_plane_state;
> > > > > +	struct dpu_plane_state *old_dpstate, *new_dpstate;
> > > > > +
> > > > > +	drm_atomic_crtc_state_for_each_plane(plane, crtc_state) {
> > > > > +		old_plane_state = drm_atomic_get_old_plane_state(state,
> > > > plane);
> > > > > +		new_plane_state = drm_atomic_get_new_plane_state(state,
> > > > plane);
> > > > > +
> > > > > +		old_dpstate = to_dpu_plane_state(old_plane_state);
> > > > > +		new_dpstate = to_dpu_plane_state(new_plane_state);
> > > > > +
> > > > > +		/* Should be enough to check the below HW plane resources.
> > > > */
> > > > > +		if (old_dpstate->stage.ptr != new_dpstate->stage.ptr ||
> > > > > +		    old_dpstate->source != new_dpstate->source ||
> > > > > +		    old_dpstate->blend != new_dpstate->blend)
> > > > > +			return;
> > > > > +	}
> > > > > +
> > > > > +	drm_atomic_crtc_state_for_each_plane(plane, crtc_state)
> > > > > +		dpu_atomic_put_plane_state(state, plane);
> > > > > +
> > > > > +	dpu_atomic_put_crtc_state(state, crtc);
> > > > > +}
> > > >
> > > > That's super suspicious too. Are you really going around and dropping
> > > > and destroying plane and crtc states in a global state?
> > >
> > > Yes.
> > 
> > That's really not a good idea. Adding states are fine, dropping ones
> > aren't.
> 
> The idea is to add all active planes on two CRTCs into one commit and
> try to allocate fetchunits for those planes as a whole to best meet user's
> requirements, because ...
> 
> > 
> > > >
> > > > At the very least, you need to describe what this addresses and why you
> > > > think it's a good solution.
> > >
> > > This is the solution to assign HW resources of a plane group to the two
> > CRTCs
> > > in one DPU or one CRTC group _dynamically_ at runtime.  Dpu.h has some
> > > comments which hint this:
> > >
> > > """
> > > /*
> > >  * fetchunit/scaler/layerblend resources of a plane group are
> > >  * shared by the two CRTCs in a CRTC group
> > >  */
> > > """
> > >
> > > I can add a DPU display controller block diagram in dpu_kms.c to tell the
> > HW
> > > architecture and some SW architecture to clarify this more.
> > 
> > It's not so much the diagram that I'm looking for, but an accurate
> > description of the problem. What resource is there, why and how does it
> > need to be shared, so we can understand what you are doing there, and
> > possibly suggest other things.
> 
> ... the problem is that fetchunits(fetchdecode/fetchlayer/fetchwarp) have
> different capabilities/feature sets and they can be built upon either of the
> two CRTCs through layerblends.  The 4 fetchunits for one DPU display
> controller are fetchdecode0, fetchdecode1, fetchlayer0 and fetchwarp2.
> Correspondingly, there are 4 layerblends(0 to 3).  Layerblends blend two
> input sources(primary input and secondary input). The primary input can
> be, say, constframe or another layerblend's output.  The secondary input
> can be, say, one of those fetchunits.  For example, a real use case could
> be:
> - CRTC0:
>   Primary plane:
>     Layerblend0:
>       Primary:  constframe4
>       Secondary: fetchlayer0
>   Overlay plane:
>     Layerblend1:
>       Primary: Layerblend0 output
>       Secondary: fetchdecode1 + vscaler5 + hscaler4
> 
> - CRTC1:
>   Primary plane:
>     Layerblend2:
>       Primary:  constframe5
>       Secondary: fetchwarp2 + fetcheco2
>   Overlay plane:
>     Layerblend3:
>       Primary: Layerblend2 output
>       Secondary: fetchdecode0 + fetcheco0 + vscaler4
> 
> Some fetchunit features:
> - fetchdecode: Horizontoal/vertical downscaling through video
>    processing blocks and YUV fb with two planars(use fetcheco).
> - fetchwarp: YUV fb with two planars(use fetcheco), up to
>   8 sub-layers and warp.
> - fetchlayer: up to 8 sub-layers.
> 
> All fetchunits support RGB fb.
> 
> 
> What I do is:
> - Add all active planes(including primary and overlays) on two CRTCs
>   into one commit even if the user only wants to update the plane(s)
>   on one CRTC.
> - Those manually added planes/CRTCs are prone to put, because
>    the relevant fetchunits and layerblends are likely/eventually
>    unchanged after the allocation.
> - Traverse through fetchunit list to try to find an available one to
>    meet a plane's requirements(say CSC? Scalers?). Those prone-to-put
>    planes are handled first to use the current fetchunits if modeset
>    is not allowed, otherwise planes with lower z-positions go first.
> - Do not allow fetchunit hot-migration between two CRTCs.
> - Assign layerblend with the lowest possible index to planes.  Planes
>    with lower z-positions go first.
> - Put the prone-to-put planes/CRTC if possible to gain some
>   performance .

So, you shouldn't do it the way you did it so far, but if I got you
right, this seems similar to what we have on vc4 where all planes go
through another device (called HVS) that we maintain a global state for.
That way, any plane change will pull that global state in, and you are
getting a global view of what resources are being used in the system.

See vc4_pv_muxing_atomic_check() for an example.

Maxime
Attachment:
signature.asc

Description: PGP signature