On 29/04/2022 16:51, Marek Szyprowski wrote: > On 29.04.2022 16:16, Krzysztof Kozlowski wrote: >> On 29/04/2022 14:29, Marek Szyprowski wrote: >>> On 19.04.2022 13:34, Krzysztof Kozlowski wrote: >>>> The driver_override field from platform driver should not be initialized >>>> from static memory (string literal) because the core later kfree() it, >>>> for example when driver_override is set via sysfs. >>>> >>>> Use dedicated helper to set driver_override properly. >>>> >>>> Fixes: 950a7388f02b ("rpmsg: Turn name service into a stand alone driver") >>>> Fixes: c0cdc19f84a4 ("rpmsg: Driver for user space endpoint interface") >>>> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@xxxxxxxxxx> >>>> Reviewed-by: Bjorn Andersson <bjorn.andersson@xxxxxxxxxx> >>> This patch landed recently in linux-next as commit 42cd402b8fd4 ("rpmsg: >>> Fix kfree() of static memory on setting driver_override"). In my tests I >>> found that it triggers the following issue during boot of the >>> DragonBoard410c SBC (arch/arm64/boot/dts/qcom/apq8016-sbc.dtb): >>> >>> ------------[ cut here ]------------ >>> DEBUG_LOCKS_WARN_ON(lock->magic != lock) >>> WARNING: CPU: 1 PID: 8 at kernel/locking/mutex.c:582 >>> __mutex_lock+0x1ec/0x430 >>> Modules linked in: >>> CPU: 1 PID: 8 Comm: kworker/u8:0 Not tainted 5.18.0-rc4-next-20220429 #11815 >>> Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) >>> Workqueue: events_unbound deferred_probe_work_func >>> pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) >>> pc : __mutex_lock+0x1ec/0x430 >>> lr : __mutex_lock+0x1ec/0x430 >>> .. >>> Call trace: >>> __mutex_lock+0x1ec/0x430 >>> mutex_lock_nested+0x38/0x64 >>> driver_set_override+0x124/0x150 >>> qcom_smd_register_edge+0x2a8/0x4ec >>> qcom_smd_probe+0x54/0x80 >>> platform_probe+0x68/0xe0 >>> really_probe.part.0+0x9c/0x29c >>> __driver_probe_device+0x98/0x144 >>> driver_probe_device+0xac/0x14c >>> __device_attach_driver+0xb8/0x120 >>> bus_for_each_drv+0x78/0xd0 >>> __device_attach+0xd8/0x180 >>> device_initial_probe+0x14/0x20 >>> bus_probe_device+0x9c/0xa4 >>> deferred_probe_work_func+0x88/0xc4 >>> process_one_work+0x288/0x6bc >>> worker_thread+0x248/0x450 >>> kthread+0x118/0x11c >>> ret_from_fork+0x10/0x20 >>> irq event stamp: 3599 >>> hardirqs last enabled at (3599): [<ffff80000919053c>] >>> _raw_spin_unlock_irqrestore+0x98/0x9c >>> hardirqs last disabled at (3598): [<ffff800009190ba4>] >>> _raw_spin_lock_irqsave+0xc0/0xcc >>> softirqs last enabled at (3554): [<ffff800008010470>] _stext+0x470/0x5e8 >>> softirqs last disabled at (3549): [<ffff8000080a4514>] >>> __irq_exit_rcu+0x180/0x1ac >>> ---[ end trace 0000000000000000 ]--- >>> >>> I don't see any direct relation between the $subject and the above log, >>> but reverting the $subject on top of linux next-20220429 hides/fixes it. >>> Maybe there is a kind of memory trashing somewhere there and your change >>> only revealed it? >> Thanks for the report. I think the error path of my patch is wrong - I >> should not kfree(rpdev->driver_override) from the rpmsg code. That's the >> only thing I see now... >> >> Could you test following patch and tell if it helps? >> https://pastebin.ubuntu.com/p/rp3q9Z5fXj/ > > This doesn't help, the issue is still reported. I think I screwed this part of code. The new helper uses device_lock() (the mutexes you see in backtrace) but in rpmsg it is called before device_register() which initializes the device. I don't have a device using qcom-smd rpmsg, so it's a bit tricky to reproduce. Best regards, Krzysztof