Hi Philipp, On 13.09.2018 01:36, Philipp Zabel wrote: > Hi Stefan, > > thank you for the report. I think what happens is the following: Thanks for looking into it! > > On Wed, 2018-09-12 at 15:26 -0700, Stefan Agner wrote: >> Hi, >> >> While working on Apalis iMX6 with four displays, I encountered the >> following crash: >> > [...] >> [ 3.781163] imx-drm display-subsystem: bound 120000.hdmi (ops dw_hdmi_imx_ops) >> [ 3.788441] imx-drm display-subsystem: binding ldb (ops imx_ldb_ops) >> [ 3.794902] imx_ldb_bind, imx_ldb ec0da010 > > component_bind calls devres_open_group right before component->ops->bind > > The devmem_kzalloc'ed imx_ldb_channel that contains the encoder is > allocated in this devres group. > >> [ 3.799818] drm_encoder_init, encoder ec0da388 >> [ 3.804363] drm_encoder_init, ret 0 > > drm_encoder_init adds the encoder in imx_ldb to the > mode_config.encoder_list. > >> [ 3.807908] imx_ldb_register, encoder ec0da388 >> [ 3.812432] imx_ldb_register, ret 0 >> [ 3.815951] imx_ldb_bind encoder LVDS-46, ec0da388 ->func c0e6371c >> [ 3.822227] imx_ldb_bind, done >> [ 3.825331] imx-drm display-subsystem: bound ldb (ops imx_ldb_ops) >> [ 3.831614] imx-drm display-subsystem: binding parallel-display-controller1 (ops imx_pd_ops) >> [ 3.840285] drm_encoder_init, encoder ec8a2b78 >> [ 3.844931] imx-drm display-subsystem: Linked as a consumer to panel >> [ 3.851472] imx-drm display-subsystem: bound parallel-display-controller1 (ops imx_pd_ops) >> [ 3.859785] imx-drm display-subsystem: binding parallel-display-controller0 (ops imx_pd_ops) >> [ 3.868561] imx-drm display-subsystem: failed to bind parallel-display-controller0 (ops imx_pd_ops): -517 >> [ 3.878225] imx-drm display-subsystem: Dropping the link to panel >> [ 3.884679] imx_ldb_unbind >> [ 3.887421] imx_ldb_unbind, encoder LVDS-46, ec0da388 ->func c0e6371c >> [ 3.893950] imx_ldb_unbind, encoder (null), ec0da838 ->func 0 >> [ 3.899723] imx_ldb_unbind, done > > component_unbind calls devres_release_group after component->ops- >>unbind, freeing imx_ldb and with it the encoder that is still in the > mode_config.encoder_list Oh I see, I did not realize that component_unbind releases devres... > >> [ 3.908345] imx_drm_bind, component_bind_all, ret -517 >> [ 3.913638] drm_mode_config_cleanup, encoder TMDS-44, ec861894 ->funcs c0e63a18 >> [ 3.921260] drm_mode_config_cleanup, calling encoder->funcs->destroy! >> [ 3.927897] drm_encoder_cleanup, encoder ec861894 >> [ 3.932717] drm_mode_config_cleanup, encoder (null), ec0da388 ->funcs 0 > > Use after free. > That all makes sense now... >> [ 3.939356] drm_mode_config_cleanup, calling encoder->funcs->destroy! >> [ 3.946050] Unable to handle kernel NULL pointer dereference at virtual address 00000004 > [...] >> The Apalis iMX6 device tree I work with uses a dumb VGA bridge and a LVDS >> display directly specified in the ldb node. In the case above I did not >> compile in the dumb VGA bridge driver, that is the reason why binding >> parallel-display-controller0 fails. > > I suppose this means that component drivers must not allocate the > encoder in the component devres group during bind. Not only their own > failure, but any other component's failure can cause component_unbind to > be called in the cleanup path of component_bind_all, freeing the memory > before drm_mode_config_cleanup is called. > Indeed, moving allocation into driver bind fixes the crash! I wonder, how does devres knows that imx_ldb got allocated in probe and does not free on unbind? I guess that is the group mechanism which makes sure of that? ldb would not the only driver affected, also drivers/gpu/drm/imx/parallel-display.c allocates encoders in component bind, there are probably more. Actually, when thinking more about it, if we would have a way to call framework cleanup functions (drm_mode_config_cleanup() in this case) between the bind and unbind loops in component_bind_all(), we would not have that problem. [adding Russel, since he introduced the component infrastructure] >> The LVDS encoder (0xec1a0388 in that case) is somehow already cleaned up >> when calling drm_mode_config_cleanup(), which leads to a null pointer >> deference when trying to call destroy. >> >> I was unable to figure out why the DRM encoder in the second >> drm_mode_config_cleanup() call is already zeroed out. The component >> framework calls imx_ldb_unbind() first, but that should not free >> up the encoder afaict. Any idea? >> >> I was also wondering in the deferred probing case, functions like >> imx_ldb_bind() call devm_kzalloc on every try. Since the underlying >> device is not removed, doesn't this leads to multiple active allocations? > > See above. To me it looks like we have to allocate imx_ldb in probe and > deal with possibly partially preinitialzied structure when bind is > called a second time. > Yeah with the fact that component framework does free devres at unbind time, my concern stated here is null... -- Stefan _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel