Il 06/12/24 09:54, CK Hu (胡俊光) ha scritto:
Hi, Sasha:
On Mon, 2024-11-25 at 01:35 +0100, Javier Carrasco wrote:
External email : Please do not click links or open attachments until you have verified the sender or the content.
On 24/11/2024 23:58, Dave Airlie wrote:
On Mon, 25 Nov 2024 at 02:41, Sasha Levin <sashal@xxxxxxxxxx> wrote:
On Thu, Nov 21, 2024 at 10:25:45AM +1000, Dave Airlie wrote:
Hi Linus,
This is the main drm pull request for 6.13.
I've done a test merge into your tree, there were two conflicts both
of which seem easy enough to resolve for you.
There's a lot of rework, the panic helper support is being added to
more drivers, v3d gets support for HW superpages, scheduler
documentation, drm client and video aperture reworks, some new
MAINTAINERS added, amdgpu has the usual lots of IP refactors, Intel
has some Pantherlake enablement and xe is getting some SRIOV bits, but
just lots of stuff everywhere.
Let me know if there are any issues,
Hey Dave,
After the PR was merged, I've started seeing boot failures reported by
KernelCI:
I'll add the mediatek names I see who touched anything in the area recently.
Dave.
[ 4.395400] mediatek-drm mediatek-drm.5.auto: bound 1c014000.merge (ops 0xffffd35fd12975f8)
[ 4.396155] mediatek-drm mediatek-drm.5.auto: bound 1c000000.ovl (ops 0xffffd35fd12977b8)
[ 4.411951] mediatek-drm mediatek-drm.5.auto: bound 1c002000.rdma (ops 0xffffd35fd12989c0)
[ 4.536837] mediatek-drm mediatek-drm.5.auto: bound 1c004000.ccorr (ops 0xffffd35fd1296cf0)
[ 4.545181] mediatek-drm mediatek-drm.5.auto: bound 1c005000.aal (ops 0xffffd35fd1296a80)
[ 4.553344] mediatek-drm mediatek-drm.5.auto: bound 1c006000.gamma (ops 0xffffd35fd12972b0)
[ 4.561680] mediatek-drm mediatek-drm.5.auto: bound 1c014000.merge (ops 0xffffd35fd12975f8)
[ 4.570025] ------------[ cut here ]------------
[ 4.574630] refcount_t: underflow; use-after-free.
[ 4.579416] WARNING: CPU: 6 PID: 81 at lib/refcount.c:28 refcount_warn_saturate+0xf4/0x148
[ 4.587670] Modules linked in:
[ 4.590714] CPU: 6 UID: 0 PID: 81 Comm: kworker/u32:3 Tainted: G W 6.12.0 #1 cab58e2e59020ebd4be8ada89a65f465a316c742
[ 4.602695] Tainted: [W]=WARN
[ 4.605649] Hardware name: Acer Tomato (rev2) board (DT)
[ 4.610947] Workqueue: events_unbound deferred_probe_work_func
[ 4.616768] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 4.623715] pc : refcount_warn_saturate+0xf4/0x148
[ 4.628493] lr : refcount_warn_saturate+0xf4/0x148
[ 4.633270] sp : ffff8000807639c0
[ 4.636571] x29: ffff8000807639c0 x28: ffff34ff4116c640 x27: ffff34ff4368e080
[ 4.643693] x26: ffffd35fd1299ac8 x25: ffff34ff46c8c410 x24: 0000000000000000
[ 4.650814] x23: ffff34ff4368e080 x22: 00000000fffffdfb x21: 0000000000000002
[ 4.657934] x20: ffff34ff470c6000 x19: ffff34ff410c7c10 x18: 0000000000000006
[ 4.665055] x17: 666678302073706f x16: 2820656772656d2e x15: ffff800080763440
[ 4.672176] x14: 0000000000000000 x13: 2e656572662d7265 x12: ffffd35fd2ed14f0
[ 4.679297] x11: 0000000000000001 x10: 0000000000000001 x9 : ffffd35fd0342150
[ 4.686418] x8 : c0000000ffffdfff x7 : ffffd35fd2e21450 x6 : 00000000000affa8
[ 4.693539] x5 : ffffd35fd2ed1498 x4 : 0000000000000000 x3 : 0000000000000000
[ 4.700660] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff34ff40932580
[ 4.707781] Call trace:
[ 4.710216] refcount_warn_saturate+0xf4/0x148 (P)
[ 4.714993] refcount_warn_saturate+0xf4/0x148 (L)
[ 4.719772] kobject_put+0x110/0x118
[ 4.723335] put_device+0x1c/0x38
[ 4.726638] mtk_drm_bind+0x294/0x5c0
[ 4.730289] try_to_bring_up_aggregate_device+0x16c/0x1e0
[ 4.735673] __component_add+0xbc/0x1c0
[ 4.739495] component_add+0x1c/0x30
[ 4.743058] mtk_disp_rdma_probe+0x140/0x210
[ 4.747314] platform_probe+0x70/0xd0
[ 4.750964] really_probe+0xc4/0x2a8
[ 4.754527] __driver_probe_device+0x80/0x140
[ 4.758870] driver_probe_device+0x44/0x120
[ 4.763040] __device_attach_driver+0xc0/0x108
[ 4.767470] bus_for_each_drv+0x8c/0xf0
[ 4.771294] __device_attach+0xa4/0x198
[ 4.775117] device_initial_probe+0x1c/0x30
[ 4.779286] bus_probe_device+0xb4/0xc0
[ 4.783109] deferred_probe_work_func+0xb0/0x100
[ 4.787714] process_one_work+0x18c/0x420
[ 4.791712] worker_thread+0x30c/0x418
[ 4.795449] kthread+0x128/0x138
[ 4.798665] ret_from_fork+0x10/0x20
[ 4.802229] ---[ end trace 0000000000000000 ]---
I don't think that I'll be able to bisect further as I don't have the
relevant hardware available.
--
Thanks,
Sasha
Hello, I am one of those who touched something in the area.
To check if my changes are the cause of the boot failures, please apply
this patch:
diff --git a/drivers/gpu/drm/mediatek/mtk_drm_drv.c
b/drivers/gpu/drm/mediatek/mtk_drm_drv.c
index 9a8ef8558da9..85be035a209a 100644
--- a/drivers/gpu/drm/mediatek/mtk_drm_drv.c
+++ b/drivers/gpu/drm/mediatek/mtk_drm_drv.c
@@ -373,11 +373,12 @@ static bool mtk_drm_get_all_drm_priv(struct device
*dev)
struct mtk_drm_private *temp_drm_priv;
struct device_node *phandle = dev->parent->of_node;
const struct of_device_id *of_id;
+ struct device_node *node;
struct device *drm_dev;
unsigned int cnt = 0;
int i, j;
- for_each_child_of_node_scoped(phandle->parent, node) {
+ for_each_child_of_node(phandle->parent, node) {
struct platform_device *pdev;
of_id = of_match_node(mtk_drm_of_ids, node);
Does Javier's patch fix the problem?
CK, to resolve the issue, please revert commit
fd620fc25d88 ("drm/mediatek: Switch to for_each_child_of_node_scoped()")
Thanks,
Angelo
Regards,
CK
---
This chunk can be found in mtk_drm_get_all_drm_priv(), which is not
listed in the trace, but it is called from mtk_drm_bind().
The loop did not release the child_node if cnt == MAX_CRTC (by means of
a break), which goes against how for_each_child_of_node() should be
handled. If the child_node is indeed required afterwards (it is not
referenced anywhere after the loop), it should be acquired via
of_node_get() and stored somewhere to be able to put it later.
Then another issue would lie underneath as the reference to the
child_node is not stored in any way. But if this patch fixes the issue,
then I suppose it should be applied immediately, and the rest should be
discussed later on.
By the way, are there any logs with debug/error messages to analyze
further is the issue is something different?
Thanks and best regards,
Javier Carrasco