Am 28.03.24 um 23:43 schrieb Phil Elwell:
Hi Umang,
On Thu, 28 Mar 2024 at 16:54, Umang Jain <umang.jain@xxxxxxxxxxxxxxxx> wrote:
On 28/03/24 10:19 pm, Stefan Wahren wrote:
Hi Umang,
Am 28.03.24 um 06:16 schrieb Umang Jain:
It turns out that stopping kthreads on vchiq_shutdown_internal()
corrupts the vchiq firmware during probe.
[ 11.005324] bcm2835_vchiq 3f00b840.mailbox: sync: error: 0: sf
unexpected msgid 0@b18db30a,0
Since the kthreads are created during vchiq_probe(), symmetrically
they should be stopped on vchiq_remove().
Also, vchiq_shutdown_internal() is called by vchiq_shutdown() which
is a exported function. Hence, modules can take indirectly shut the
vchiq interface by stopping the kthreads.
Fix it by stopping the kthreads in .remove instead.
Fixes: d9c60badccc1 ("staging: vc04_services: vchiq_core: Stop
kthreads on shutdown")
Reported-by: Stefan Wahren <wahrenst@xxxxxxx>
Signed-off-by: Umang Jain <umang.jain@xxxxxxxxxxxxxxxx>
---
Reproduced on rpi-3-b 32-bit kernel with multi_v7_defconfig and all
vchiq configs options. Patch seems to fix the error mentioned in the
commit message.
i tested the patch on my Raspberry Pi 3B+ with multi_v7_defconfig and
CONFIG_BCM_VIDEOCORE=y
CONFIG_BCM2835_VCHIQ=m
CONFIG_VCHIQ_CDEV=y
Now X came up, but if i run
modprobe -r vchiq
but i'm still getting
[ 146.730014] bcm2835_vchiq 3f00b840.mailbox: sync: error: 0: sf
unexpected msgid 0@10b01b8d,0
so it seems to me stopping those kthreads isn't that trivial (which i
never expected). Maybe we need to care about the order or an even more
complex synchronization mechanism between VideoCore firmware and the
kthreads.
Dave, Phil - would you happen to have any opinions on this behavior ?
I won't have the time for further investigations during the eastern
holidays. Maybe we should revert d9c60badccc1 and take the necessary
time for proper testing.
It's hard to be sure, but I would guess that stopping the thread is
causing the remote_event_wait in sync_func to terminate early. The
following code assumes that the message is valid, when it fact it will
have a message type of VCHIQ_MSG_PADDING.
If that's the case, you can either find some way to spot that the wait
has actually failed, or handle VCHIQ_MSG_PADDING explicitly in the
switch.
Thanks. It seems the return value of remote_event_wait() is ignored here
and should be used.
Phil