RE: [PATCH v14 RESEND 5/6] drm/imx: Introduce i.MX8qm/qxp DPU DRM

Ying Liu <victor.liu@xxxxxxx> · Tue, 26 Sep 2023 03:55:35 +0000

On Tuesday, September 5, 2023 4:37 PM, Maxime Ripard <mripard@xxxxxxxxxx> wrote:
> On Tue, Sep 05, 2023 at 03:32:47AM +0000, Ying Liu wrote:
> > > On Tue, Aug 22, 2023 at 04:59:48PM +0800, Liu Ying wrote:
> > > > +int dpu_cf_init(struct dpu_soc *dpu, unsigned int index,
> > > > +		unsigned int id, enum dpu_unit_type type,
> > > > +		unsigned long pec_base, unsigned long base)
> > > > +{
> > > > +	struct dpu_constframe *cf;
> > > > +
> > > > +	cf = devm_kzalloc(dpu->dev, sizeof(*cf), GFP_KERNEL);
> > > > +	if (!cf)
> > > > +		return -ENOMEM;
> > > > +
> > > > +	dpu->cf_priv[index] = cf;
> > >
> > > You can't store structures related to KMS in a device managed structure.
> > > The DRM KMS device will stick around (and be accessible from userspace)
> > > after the device has been removed until the last application closed its
> > > file descriptor to the device.
> >
> > The DRM device is registered after component_bind_all() is called in
> > dpu_drm_bind().  The CRTC components' platform devices are created
> > in the dpu_core_probe() where the device managed resources are
> > created.   So, it looks those resources are safe because the DRM device
> > will be unregistered before those resources are freed.
> 
> Not, it's not, because the KMS device isn't freed when devices will be
> unbound/removed, but when the last application closes its fd to it, and
> so you'll get dangling pointers.
> 
> The general rule to get it right is to use drmm for anything but device
> resources (like clocks, regulators, memory mapping, etc.). You can
> deviate from the rule, of course, but you'll need a long and clear
> explanation on why it doesn't work, and why your new approach works.
> Your current approach doesn't though.

I'll try to follow the rule in next version. Will call drmm_kzalloc() instead of
devm_kzalloc() here.

> 
> > > This can be checked by enabling KASAN and manually unbinding the
> driver
> > > through sysfs.
> >
> > I enabled KASAN and manually unbound the dpu-core driver with
> command:
> >
> > echo 56180000.dpu > /sys/bus/platform/drivers/dpu-
> core/56180000.dpu/driver/unbind
> >
> > KASAN didin't report memory issue regarding those device managed
> > resources.  However, it did report another issue in dpu_drm_unbind(),
> > where drm_device should be got from drv_data->drm_dev instead of
> > dev_get_drvdata(dev).  I'll fix that in next version.
> >
> > BTW, the dpu-core driver was successfully bound again after unbinding
> with
> > command:
> >
> > echo  56180000.dpu > /sys/bus/platform/drivers/dpu-core/bind
> 
> Guess you're lucky. That doesn't make it safe or right.
> 
> > > > +	cf->pec_base = devm_ioremap(dpu->dev, pec_base, SZ_16);
> > > > +	if (!cf->pec_base)
> > > > +		return -ENOMEM;
> > > > +
> > > > +	cf->base = devm_ioremap(dpu->dev, base, SZ_32);
> > > > +	if (!cf->base)
> > > > +		return -ENOMEM;
> > >
> > > For the same reason, you need to protect any access to a device managed
> > > resource (so clocks, registers, regulators, etc.) by a call to
> > > drm_dev_enter/drm_dev_exit and you need to call drm_dev_unplug
> instead
> > > of drm_dev_unregister.
> >
> > That's a good point. I've tried to do that, but it turns out that the
> > display controller cannot be enabled again after binding the dpu-core
> > driver manually again. It seems that the display controller requires a
> > proper disablement procedure, but the "driver instance overview " kdoc
> > mentions the shortcoming of no proper disablement if drm_dev_unplug()
> > is used:
> >
> > """
> > * Drivers that want to support device unplugging (USB, DT overlay unload)
> should
> >  * use drm_dev_unplug() instead of drm_dev_unregister(). The driver must
> protect
> >  * regions that is accessing device resources to prevent use after they're
> >  * released. This is done using drm_dev_enter() and drm_dev_exit(). There
> is one
> >  * shortcoming however, drm_dev_unplug() marks the drm_device as
> unplugged before
> >  * drm_atomic_helper_shutdown() is called. This means that if the disable
> code
> >  * paths are protected, they will not run on regular driver module unload,
> >  * possibly leaving the hardware enabled.
> > """
> >
> > A DPU reset in dpu_core() might be helpful, but I'm not sure if there is any
> > reset line provided by the embodying system.
> 
> Generally speaking, you shouldn't rely on the device being in any
> particuliar state before your driver loads. So a reset at probe/bind
> time is a good idea.

Yes. I'll drop the platform device creations for CRTCs from dpu-core.c 
and drop the aggregation of CRTC components from different DPU
instances into one DRM device.  This way, there will be only two CRTCs
of one DPU in one DRM device. Then, the driver will be simpler and
users cannot unbind the driver of one of the two DPU instances, which
means drm_dev_unplug() won't be needed any more(?) and the reset
issue will be gone.  The controller will be shutdown properly through
drm_atomic_helper_shutdown() when the driver module is removed.

> 
> > Even if the reset works, the 2nd DPU instance in i.MX8qm would be a
> > problem, because it won't be reset or properly disabled if the 1st DPU
> > instance is unbound.
> 
> Why it wouldn't be reset?

Because dpu_core_remove() is not called for the 2nd DPU instance.
Anyway, with the above new design, this doesn't seem to be a problem
any more.

> 
> > Although the two DPU instances could be wrapped by two DRM devices, I
> > tend not to do that because downstream bridges in future SoCs might be
> > able to mux to different DPU instances at runtime.
> >
> > Due to the disablement issue, can we set drm_dev_enter/exit/unplug
> > aside first?
> 
> I'd rather have that figured out prior to merging.

I'm assuming that drm_dev_enter/exit/unplug won't be needed with the above
new design - one DPU instance wrapped by one DRM device.

> > >
> > > > +static int dpu_crtc_pm_runtime_put(struct dpu_crtc *dpu_crtc)
> > > > +{
> > > > +	int ret;
> > > > +
> > > > +	ret = pm_runtime_put(dpu_crtc->dev->parent);
> > > > +	if (ret < 0)
> > > > +		dpu_crtc_err(&dpu_crtc->base,
> > > > +			     "failed to put parent device RPM: %d\n", ret);
> > > > +
> > > > +	return ret;
> > > > +}
> > > > +
> > > > +static void dpu_crtc_mode_set_nofb(struct drm_crtc *crtc)
> > > > +{
> > > > +	struct dpu_crtc *dpu_crtc = to_dpu_crtc(crtc);
> > > > +	struct drm_display_mode *adj = &crtc->state->adjusted_mode;
> > > > +	enum dpu_link_id cf_link;
> > > > +
> > > > +	dpu_crtc_dbg(crtc, "mode " DRM_MODE_FMT "\n",
> > > DRM_MODE_ARG(adj));
> > > > +
> > > > +	/* request power-on when we start to set mode for CRTC */
> > > > +	dpu_crtc_pm_runtime_get_sync(dpu_crtc);
> > >
> > > From the drm_crtc_helper_funcs documentation:
> > >
> > > """
> > > 	 * Note that the display pipe is completely off when this function is
> > > 	 * called. Atomic drivers which need hardware to be running before
> > > they
> > > 	 * program the new display mode (e.g. because they implement
> > > runtime PM)
> > > 	 * should not use this hook. This is because the helper library calls
> > > 	 * this hook only once per mode change and not every time the
> display
> > > 	 * pipeline is suspended using either DPMS or the new "ACTIVE"
> > > property.
> > > 	 * Which means register values set in this callback might get reset
> > > when
> > > 	 * the CRTC is suspended, but not restored.  Such drivers should
> > > instead
> > > 	 * move all their CRTC setup into the @atomic_enable callback.
> > > """
> >
> > I can use drm_atomic_helper_commit_tail() but not
> > drm_atomic_helper_commit_tail_rpm() because the planes need to be
> > updated prior to modeset_enables(where trigger shadow registers in
> > plane's HW resources to take effect).   Using the former one means that
> > RPM needs to be get/put in drm_atomic_helper_commit_planes(), which
> > doesn't seem good because in some cases the power of the display
> controller
> > might be lost after RPM put and I'm not sure if the registers set there will
> > be lost or not.   So, to avoid the issue the documentation mentions,
> > crtc_state->mode_changed is forced to be true in dpu_crtc_atomic_check()
> > if the CRTC is changed to active.  Is this acceptable?
> 
> No, just move the crtc setup to atomic_enable like the doc suggests.

Ok, will do.

> 
> > > > +static void dpu_crtc_atomic_enable(struct drm_crtc *crtc,
> > > > +				   struct drm_atomic_state *state)
> > > > +{
> > > > +	struct dpu_crtc *dpu_crtc = to_dpu_crtc(crtc);
> > > > +	unsigned long flags;
> > > > +
> > > > +	drm_crtc_vblank_on(crtc);
> > > > +
> > > > +	enable_irq(dpu_crtc->dec_shdld_irq);
> > > > +	enable_irq(dpu_crtc->ed_cont_shdld_irq);
> > > > +	enable_irq(dpu_crtc->ed_safe_shdld_irq);
> > > > +
> > > > +	dpu_fg_enable_clock(dpu_crtc->fg);
> > > > +	dpu_ed_pec_sync_trigger(dpu_crtc->ed_cont);
> > > > +	dpu_ed_pec_sync_trigger(dpu_crtc->ed_safe);
> > > > +	if (crtc->state->gamma_lut)
> > > > +		dpu_crtc_set_gammacor(dpu_crtc);
> > > > +	else
> > > > +		dpu_crtc_disable_gammacor(dpu_crtc);
> > > > +	dpu_fg_shdtokgen(dpu_crtc->fg);
> > > > +
> > > > +	/* don't relinquish CPU until TCON is set to operation mode */
> > > > +	local_irq_save(flags);
> > > > +	preempt_disable();
> > > > +	dpu_fg_enable(dpu_crtc->fg);
> > >
> > > That's super fishy. You shouldn't need that, at all. What is going on
> > > there?
> >
> > This aims to fully workaround the below errata recently released by
> > NXP.
> >
> > ERR010910: DC: Display Subsystem First Frame Programming Restriction
> > Link:
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.
> nxp.com%2Fdocs%2Fen%2Ferrata%2FIMX8_1N94W.pdf&data=05%7C01%7C
> victor.liu%40nxp.com%7C1250637791414caf559b08dbadeb597e%7C686ea1d
> 3bc2b4c6fa92cd99c5c301635%7C0%7C1%7C638294998555649537%7CUnkno
> wn%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1h
> aWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=mKAtsgN30jqfUiczAFe3
> dXPUZOXhM%2Fnlq9%2F%2B%2F8%2Bjj24%3D&reserved=0
> >
> > In short, to make sure the display controller can be enabled properly with
> > prefetch engine(DPRC + PRG), the TCON must be switch from bypass mode
> > to operation mode after FrameGen(FG) generates the first frame.
> >
> > Timing is critical here, so local irq and preemption are disabled during the
> > switch procedure.
> >
> > BTW, the driver always use prefetch engines for KMS, although they can
> > be bypassed.
> 
> Ok. So I think it would help your driver getting merged to split that
> workaround into a separate patch. It's hard to tell what are the
> implications of that workaround on your driver when it's lost in the
> middle of, well, the driver :)
> 
> I guess it would be much easier for everyone if you submitted that
> driver without the errata handling, or with prefetch bypassed, at first.
> And then you can submit / rework the hard parts.

Ok, I'll drop the prefetch engine support and the errata handling.
Also, I'll drop add-on features, like gamma, csc and alpha, to achieve
kind of minimal feature set.

> 
> > >
> > > > +
> > > > +	/*
> > > > +	 * TKT320590:
> > >
> > > Those are NXP internal references as far as as I can tell. They
> > > shouldn't be here.
> >
> > Ok, will change it to be ERR010910.
> 
> A link to the errata description would be a good idea too.

Ok, will add a link when errata handling is supported.

> 
> > > > +static void dpu_atomic_put_plane_state(struct drm_atomic_state
> *state,
> > > > +				       struct drm_plane *plane)
> > > > +{
> > > > +	int index = drm_plane_index(plane);
> > > > +
> > > > +	plane->funcs->atomic_destroy_state(plane, state-
> >planes[index].state);
> > > > +	state->planes[index].ptr = NULL;
> > > > +	state->planes[index].state = NULL;
> > > > +	state->planes[index].old_state = NULL;
> > > > +	state->planes[index].new_state = NULL;
> > > > +
> > > > +	drm_modeset_unlock(&plane->mutex);
> > > > +
> > > > +	dpu_plane_dbg(plane, "put state\n");
> > > > +}
> > > > +
> > > > +static void dpu_atomic_put_crtc_state(struct drm_atomic_state *state,
> > > > +				      struct drm_crtc *crtc)
> > > > +{
> > > > +	int index = drm_crtc_index(crtc);
> > > > +
> > > > +	crtc->funcs->atomic_destroy_state(crtc, state->crtcs[index].state);
> > > > +	state->crtcs[index].ptr = NULL;
> > > > +	state->crtcs[index].state = NULL;
> > > > +	state->crtcs[index].old_state = NULL;
> > > > +	state->crtcs[index].new_state = NULL;
> > > > +
> > > > +	drm_modeset_unlock(&crtc->mutex);
> > > > +
> > > > +	dpu_crtc_dbg(crtc, "put state\n");
> > > > +}
> > > > +
> > > > +static void
> > > > +dpu_atomic_put_possible_states_per_crtc(struct drm_crtc_state
> > > *crtc_state)
> > > > +{
> > > > +	struct drm_atomic_state *state = crtc_state->state;
> > > > +	struct drm_crtc *crtc = crtc_state->crtc;
> > > > +	struct drm_plane *plane;
> > > > +	struct drm_plane_state *old_plane_state, *new_plane_state;
> > > > +	struct dpu_plane_state *old_dpstate, *new_dpstate;
> > > > +
> > > > +	drm_atomic_crtc_state_for_each_plane(plane, crtc_state) {
> > > > +		old_plane_state = drm_atomic_get_old_plane_state(state,
> > > plane);
> > > > +		new_plane_state = drm_atomic_get_new_plane_state(state,
> > > plane);
> > > > +
> > > > +		old_dpstate = to_dpu_plane_state(old_plane_state);
> > > > +		new_dpstate = to_dpu_plane_state(new_plane_state);
> > > > +
> > > > +		/* Should be enough to check the below HW plane resources.
> > > */
> > > > +		if (old_dpstate->stage.ptr != new_dpstate->stage.ptr ||
> > > > +		    old_dpstate->source != new_dpstate->source ||
> > > > +		    old_dpstate->blend != new_dpstate->blend)
> > > > +			return;
> > > > +	}
> > > > +
> > > > +	drm_atomic_crtc_state_for_each_plane(plane, crtc_state)
> > > > +		dpu_atomic_put_plane_state(state, plane);
> > > > +
> > > > +	dpu_atomic_put_crtc_state(state, crtc);
> > > > +}
> > >
> > > That's super suspicious too. Are you really going around and dropping
> > > and destroying plane and crtc states in a global state?
> >
> > Yes.
> 
> That's really not a good idea. Adding states are fine, dropping ones
> aren't.

The idea is to add all active planes on two CRTCs into one commit and
try to allocate fetchunits for those planes as a whole to best meet user's
requirements, because ...

> 
> > >
> > > At the very least, you need to describe what this addresses and why you
> > > think it's a good solution.
> >
> > This is the solution to assign HW resources of a plane group to the two
> CRTCs
> > in one DPU or one CRTC group _dynamically_ at runtime.  Dpu.h has some
> > comments which hint this:
> >
> > """
> > /*
> >  * fetchunit/scaler/layerblend resources of a plane group are
> >  * shared by the two CRTCs in a CRTC group
> >  */
> > """
> >
> > I can add a DPU display controller block diagram in dpu_kms.c to tell the
> HW
> > architecture and some SW architecture to clarify this more.
> 
> It's not so much the diagram that I'm looking for, but an accurate
> description of the problem. What resource is there, why and how does it
> need to be shared, so we can understand what you are doing there, and
> possibly suggest other things.

... the problem is that fetchunits(fetchdecode/fetchlayer/fetchwarp) have
different capabilities/feature sets and they can be built upon either of the
two CRTCs through layerblends.  The 4 fetchunits for one DPU display
controller are fetchdecode0, fetchdecode1, fetchlayer0 and fetchwarp2.
Correspondingly, there are 4 layerblends(0 to 3).  Layerblends blend two
input sources(primary input and secondary input). The primary input can
be, say, constframe or another layerblend's output.  The secondary input
can be, say, one of those fetchunits.  For example, a real use case could
be:
- CRTC0:
  Primary plane:
    Layerblend0:
      Primary:  constframe4
      Secondary: fetchlayer0
  Overlay plane:
    Layerblend1:
      Primary: Layerblend0 output
      Secondary: fetchdecode1 + vscaler5 + hscaler4

- CRTC1:
  Primary plane:
    Layerblend2:
      Primary:  constframe5
      Secondary: fetchwarp2 + fetcheco2
  Overlay plane:
    Layerblend3:
      Primary: Layerblend2 output
      Secondary: fetchdecode0 + fetcheco0 + vscaler4

Some fetchunit features:
- fetchdecode: Horizontoal/vertical downscaling through video
   processing blocks and YUV fb with two planars(use fetcheco).
- fetchwarp: YUV fb with two planars(use fetcheco), up to
  8 sub-layers and warp.
- fetchlayer: up to 8 sub-layers.

All fetchunits support RGB fb.

What I do is:
- Add all active planes(including primary and overlays) on two CRTCs
  into one commit even if the user only wants to update the plane(s)
  on one CRTC.
- Those manually added planes/CRTCs are prone to put, because
   the relevant fetchunits and layerblends are likely/eventually
   unchanged after the allocation.
- Traverse through fetchunit list to try to find an available one to
   meet a plane's requirements(say CSC? Scalers?). Those prone-to-put
   planes are handled first to use the current fetchunits if modeset
   is not allowed, otherwise planes with lower z-positions go first.
- Do not allow fetchunit hot-migration between two CRTCs.
- Assign layerblend with the lowest possible index to planes.  Planes
   with lower z-positions go first.
- Put the prone-to-put planes/CRTC if possible to gain some
  performance .

Regards,
Liu Ying

> 
> Maxime