On 24/11/2024 23:58, Dave Airlie wrote: > On Mon, 25 Nov 2024 at 02:41, Sasha Levin <sashal@xxxxxxxxxx> wrote: >> >> On Thu, Nov 21, 2024 at 10:25:45AM +1000, Dave Airlie wrote: >>> Hi Linus, >>> >>> This is the main drm pull request for 6.13. >>> >>> I've done a test merge into your tree, there were two conflicts both >>> of which seem easy enough to resolve for you. >>> >>> There's a lot of rework, the panic helper support is being added to >>> more drivers, v3d gets support for HW superpages, scheduler >>> documentation, drm client and video aperture reworks, some new >>> MAINTAINERS added, amdgpu has the usual lots of IP refactors, Intel >>> has some Pantherlake enablement and xe is getting some SRIOV bits, but >>> just lots of stuff everywhere. >>> >>> Let me know if there are any issues, >> >> Hey Dave, >> >> After the PR was merged, I've started seeing boot failures reported by >> KernelCI: > > I'll add the mediatek names I see who touched anything in the area recently. > > Dave. >> >> [ 4.395400] mediatek-drm mediatek-drm.5.auto: bound 1c014000.merge (ops 0xffffd35fd12975f8) >> [ 4.396155] mediatek-drm mediatek-drm.5.auto: bound 1c000000.ovl (ops 0xffffd35fd12977b8) >> [ 4.411951] mediatek-drm mediatek-drm.5.auto: bound 1c002000.rdma (ops 0xffffd35fd12989c0) >> [ 4.536837] mediatek-drm mediatek-drm.5.auto: bound 1c004000.ccorr (ops 0xffffd35fd1296cf0) >> [ 4.545181] mediatek-drm mediatek-drm.5.auto: bound 1c005000.aal (ops 0xffffd35fd1296a80) >> [ 4.553344] mediatek-drm mediatek-drm.5.auto: bound 1c006000.gamma (ops 0xffffd35fd12972b0) >> [ 4.561680] mediatek-drm mediatek-drm.5.auto: bound 1c014000.merge (ops 0xffffd35fd12975f8) >> [ 4.570025] ------------[ cut here ]------------ >> [ 4.574630] refcount_t: underflow; use-after-free. >> [ 4.579416] WARNING: CPU: 6 PID: 81 at lib/refcount.c:28 refcount_warn_saturate+0xf4/0x148 >> [ 4.587670] Modules linked in: >> [ 4.590714] CPU: 6 UID: 0 PID: 81 Comm: kworker/u32:3 Tainted: G W 6.12.0 #1 cab58e2e59020ebd4be8ada89a65f465a316c742 >> [ 4.602695] Tainted: [W]=WARN >> [ 4.605649] Hardware name: Acer Tomato (rev2) board (DT) >> [ 4.610947] Workqueue: events_unbound deferred_probe_work_func >> [ 4.616768] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) >> [ 4.623715] pc : refcount_warn_saturate+0xf4/0x148 >> [ 4.628493] lr : refcount_warn_saturate+0xf4/0x148 >> [ 4.633270] sp : ffff8000807639c0 >> [ 4.636571] x29: ffff8000807639c0 x28: ffff34ff4116c640 x27: ffff34ff4368e080 >> [ 4.643693] x26: ffffd35fd1299ac8 x25: ffff34ff46c8c410 x24: 0000000000000000 >> [ 4.650814] x23: ffff34ff4368e080 x22: 00000000fffffdfb x21: 0000000000000002 >> [ 4.657934] x20: ffff34ff470c6000 x19: ffff34ff410c7c10 x18: 0000000000000006 >> [ 4.665055] x17: 666678302073706f x16: 2820656772656d2e x15: ffff800080763440 >> [ 4.672176] x14: 0000000000000000 x13: 2e656572662d7265 x12: ffffd35fd2ed14f0 >> [ 4.679297] x11: 0000000000000001 x10: 0000000000000001 x9 : ffffd35fd0342150 >> [ 4.686418] x8 : c0000000ffffdfff x7 : ffffd35fd2e21450 x6 : 00000000000affa8 >> [ 4.693539] x5 : ffffd35fd2ed1498 x4 : 0000000000000000 x3 : 0000000000000000 >> [ 4.700660] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff34ff40932580 >> [ 4.707781] Call trace: >> [ 4.710216] refcount_warn_saturate+0xf4/0x148 (P) >> [ 4.714993] refcount_warn_saturate+0xf4/0x148 (L) >> [ 4.719772] kobject_put+0x110/0x118 >> [ 4.723335] put_device+0x1c/0x38 >> [ 4.726638] mtk_drm_bind+0x294/0x5c0 >> [ 4.730289] try_to_bring_up_aggregate_device+0x16c/0x1e0 >> [ 4.735673] __component_add+0xbc/0x1c0 >> [ 4.739495] component_add+0x1c/0x30 >> [ 4.743058] mtk_disp_rdma_probe+0x140/0x210 >> [ 4.747314] platform_probe+0x70/0xd0 >> [ 4.750964] really_probe+0xc4/0x2a8 >> [ 4.754527] __driver_probe_device+0x80/0x140 >> [ 4.758870] driver_probe_device+0x44/0x120 >> [ 4.763040] __device_attach_driver+0xc0/0x108 >> [ 4.767470] bus_for_each_drv+0x8c/0xf0 >> [ 4.771294] __device_attach+0xa4/0x198 >> [ 4.775117] device_initial_probe+0x1c/0x30 >> [ 4.779286] bus_probe_device+0xb4/0xc0 >> [ 4.783109] deferred_probe_work_func+0xb0/0x100 >> [ 4.787714] process_one_work+0x18c/0x420 >> [ 4.791712] worker_thread+0x30c/0x418 >> [ 4.795449] kthread+0x128/0x138 >> [ 4.798665] ret_from_fork+0x10/0x20 >> [ 4.802229] ---[ end trace 0000000000000000 ]--- >> >> I don't think that I'll be able to bisect further as I don't have the >> relevant hardware available. >> >> -- >> Thanks, >> Sasha Hello, I am one of those who touched something in the area. To check if my changes are the cause of the boot failures, please apply this patch: diff --git a/drivers/gpu/drm/mediatek/mtk_drm_drv.c b/drivers/gpu/drm/mediatek/mtk_drm_drv.c index 9a8ef8558da9..85be035a209a 100644 --- a/drivers/gpu/drm/mediatek/mtk_drm_drv.c +++ b/drivers/gpu/drm/mediatek/mtk_drm_drv.c @@ -373,11 +373,12 @@ static bool mtk_drm_get_all_drm_priv(struct device *dev) struct mtk_drm_private *temp_drm_priv; struct device_node *phandle = dev->parent->of_node; const struct of_device_id *of_id; + struct device_node *node; struct device *drm_dev; unsigned int cnt = 0; int i, j; - for_each_child_of_node_scoped(phandle->parent, node) { + for_each_child_of_node(phandle->parent, node) { struct platform_device *pdev; of_id = of_match_node(mtk_drm_of_ids, node); --- This chunk can be found in mtk_drm_get_all_drm_priv(), which is not listed in the trace, but it is called from mtk_drm_bind(). The loop did not release the child_node if cnt == MAX_CRTC (by means of a break), which goes against how for_each_child_of_node() should be handled. If the child_node is indeed required afterwards (it is not referenced anywhere after the loop), it should be acquired via of_node_get() and stored somewhere to be able to put it later. Then another issue would lie underneath as the reference to the child_node is not stored in any way. But if this patch fixes the issue, then I suppose it should be applied immediately, and the rest should be discussed later on. By the way, are there any logs with debug/error messages to analyze further is the issue is something different? Thanks and best regards, Javier Carrasco