Re: [PATCH] drm: enable render nodes wherever buffer sharing is supported

Peter Collingbourne <pcc@xxxxxxxxxx> · Tue, 28 Apr 2020 09:47:01 -0700

On Tue, Apr 28, 2020 at 4:48 AM Liviu Dudau <Liviu.Dudau@xxxxxxx> wrote:
>
> On Mon, Apr 27, 2020 at 07:22:02PM -0700, Peter Collingbourne wrote:
> > On Mon, Apr 27, 2020 at 1:47 PM Eric Anholt <eric@xxxxxxxxxx> wrote:
> > >
> > > On Mon, Apr 27, 2020 at 1:05 PM Peter Collingbourne <pcc@xxxxxxxxxx> wrote:
> > > >
> > > > Render nodes are not just useful for devices supporting GPU hardware
> > > > acceleration. Even on devices that only support dumb frame buffers,
> > > > they are useful in situations where composition (using software
> > > > rasterization) and KMS are done in different processes with buffer
> > > > sharing being used to send frame buffers between them. This is the
> > > > situation on Android, where surfaceflinger is the compositor and the
> > > > composer HAL uses KMS to display the buffers. Thus it is beneficial
> > > > to expose render nodes on all devices that support buffer sharing.
> > > >
> > > > Because all drivers that currently support render nodes also support
> > > > buffer sharing, the DRIVER_RENDER flag is no longer necessary to mark
> > > > devices as supporting render nodes, so remove it and just rely on
> > > > the presence of a prime_handle_to_fd function pointer to determine
> > > > whether buffer sharing is supported.
> > >
> > > I'm definitely interested in seeing a patch like this land, as I think
> > > the current state is an ugly historical artifact.  We just have to be
> > > careful.
> > >
> > > Were there any instances of drivers with render engines exposing PRIME
> > > but not RENDER?  We should be careful to make sure that we're not
> > > exposing new privileges for those through adding render nodes.
> >
> > These are the drivers that we'd be adding render nodes for with this change:
> >
> > $ git grep -l prime_handle_to_fd (git grep -L DRIVER_RENDER (git grep
> > -l '\.driver_features'))
> > drivers/gpu/drm/arc/arcpgu_drv.c
> > drivers/gpu/drm/arm/display/komeda/komeda_kms.c
> > drivers/gpu/drm/arm/hdlcd_drv.c
> > drivers/gpu/drm/arm/malidp_drv.c
> > drivers/gpu/drm/armada/armada_drv.c
> > drivers/gpu/drm/aspeed/aspeed_gfx_drv.c
> > drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c
> > drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_drv.c
> > drivers/gpu/drm/hisilicon/kirin/kirin_drm_ade.c
> > drivers/gpu/drm/imx/imx-drm-core.c
> > drivers/gpu/drm/ingenic/ingenic-drm.c
> > drivers/gpu/drm/mcde/mcde_drv.c
> > drivers/gpu/drm/mediatek/mtk_drm_drv.c
> > drivers/gpu/drm/meson/meson_drv.c
> > drivers/gpu/drm/mxsfb/mxsfb_drv.c
> > drivers/gpu/drm/pl111/pl111_drv.c
> > drivers/gpu/drm/qxl/qxl_drv.c
> > drivers/gpu/drm/rcar-du/rcar_du_drv.c
> > drivers/gpu/drm/rockchip/rockchip_drm_drv.c
> > drivers/gpu/drm/shmobile/shmob_drm_drv.c
> > drivers/gpu/drm/sti/sti_drv.c
> > drivers/gpu/drm/stm/drv.c
> > drivers/gpu/drm/tilcdc/tilcdc_drv.c
> > drivers/gpu/drm/tve200/tve200_drv.c
> > drivers/gpu/drm/xen/xen_drm_front.c
> > drivers/gpu/drm/zte/zx_drm_drv.c
> >
> > Some of the drivers provide custom ioctls but they are already
> > protected from render nodes by not setting DRM_RENDER_ALLOW. Another
> > thing to check for is drivers providing custom fops that might expose
> > something undesirable in the render node:
> >
> > $ git grep -L 'DEFINE_DRM_GEM_CMA_FOPS\|DEFINE_DRM_GEM_FOPS' (git grep
> > -l prime_handle_to_fd (git grep -L DRIVER_RENDER (git grep -l
> > '\.driver_features')))
> > drivers/gpu/drm/mediatek/mtk_drm_drv.c
> > drivers/gpu/drm/rockchip/rockchip_drm_drv.c
> > drivers/gpu/drm/xen/xen_drm_front.c
> >
> > But looking at those drivers in detail, I see that each of them is
> > using the standard DRM fops handlers (which presumably already handle
> > render nodes correctly) with the exception of mmap, which they provide
> > wrappers for that mostly just wrap drm_gem_mmap.
> >
> > So unless I'm missing something significant (which seems likely -- I'm
> > not a DRM expert), I don't see a problem so far.
> >
> > > There's a UAPI risk I see here.  Right now, on a system with a single
> > > renderer GPU, we can just open /dev/dri/renderD128 and get the GPU for
> > > rendering, and various things are relying on that (such as libwaffle,
> > > used in piglit among other things)   Adding render nodes for kms-only
> > > drivers could displace the actual GPU to 129, and the question is
> > > whether this will be disruptive.  For Mesa, I think this works out,
> > > because kmsro should load on the kms device's node and then share
> > > buffers over to the real GPU that it digs around to find at init time.
> > > Just saying, I'm not sure I know all of the userspace well enough to
> > > say "this should be safe despite that"
> > >
> > > (And, maybe, if we decide that it's not safe enough, we could punt
> > > kms-only drivers to a higher starting number?)
> >
> > Android (minigbm) similarly tries to open /dev/dri/renderD$N in a loop
> > with 128 <= N < 192 and assumes that the first non-blacklisted (i.e.
> > not vgem) one that it can open corresponds to the real GPU [1]. I
> > think that the risk of breaking something on Android is low since
> > Android's architecture basically already depends on there being a
> > render node, and it seems unlikely for a device to have more than one
> > GPU, one of which would be non-functional.
>
> Would Juno suffer from this issue? It has 2 HDLCD drivers (both can be
> active at the same time if you want) and you could run panfrost for the
> actual GPU. Depending on the probing order, HDLCD render nodes would be
> registered before the GPU's.

If I'm reading the Mesa code [1] correctly, it will loop through the
render nodes and attempt to match the driver name against its list of
known devices. So if there were two hdlcd render nodes and one
panfrost render node it should find the latter regardless of probing
order, because it doesn't know about hdlcd.

Peter

[1] https://cs.android.com/android/platform/superproject/+/master:external/mesa3d/src/gallium/auxiliary/pipe-loader/pipe_loader_drm.c;l=214

>
> Best regards,
> Liviu
>
> >
> > It's also worth bearing in mind that render nodes were added to vgem
> > in commit 3a6eb795 from 2018. To the extent that exposing additional
> > render nodes would lead to widespread breakage, this would seem to me
> > to be a likely way in which it could have happened (since I would
> > expect that it could cause many machines to go from having one render
> > node to having more than one), so perhaps the argument can be made
> > that if we hadn't seen widespread breakage as a result of that change,
> > we'd be unlikely to see it as a result of this one.
> >
> > This would be conditional on the userspace code not blacklisting the
> > vgem render node like minigbm does -- at a glance I couldn't find such
> > code in Mesa (there does appear to be some code that looks for the
> > vgem driver name, but it seems to only be used on primary nodes, not
> > render nodes) or libwaffle.
> >
> > Peter
> >
> > [1] https://cs.android.com/android/platform/superproject/+/master:external/minigbm/cros_gralloc/cros_gralloc_driver.cc;l=48
>
> --
> ====================
> | I would like to |
> | fix the world,  |
> | but they're not |
> | giving me the   |
>  \ source code!  /
>   ---------------
>     ¯\_(ツ)_/¯
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel