Re: [PATCH] staging: vc04_services: Stop kthreads on .remove

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 28.03.24 um 23:43 schrieb Phil Elwell:
Hi Umang,

On Thu, 28 Mar 2024 at 16:54, Umang Jain <umang.jain@xxxxxxxxxxxxxxxx> wrote:


On 28/03/24 10:19 pm, Stefan Wahren wrote:
Hi Umang,

Am 28.03.24 um 06:16 schrieb Umang Jain:
It turns out that stopping kthreads on vchiq_shutdown_internal()
corrupts the vchiq firmware during probe.

[   11.005324] bcm2835_vchiq 3f00b840.mailbox: sync: error: 0: sf
unexpected msgid 0@b18db30a,0

Since the kthreads are created during vchiq_probe(), symmetrically
they should be stopped on vchiq_remove().

Also, vchiq_shutdown_internal() is called by vchiq_shutdown() which
is a exported function. Hence, modules can take indirectly shut the
vchiq interface by stopping the kthreads.

Fix it by stopping the kthreads in .remove instead.

Fixes: d9c60badccc1 ("staging: vc04_services: vchiq_core: Stop
kthreads on shutdown")
Reported-by: Stefan Wahren <wahrenst@xxxxxxx>
Signed-off-by: Umang Jain <umang.jain@xxxxxxxxxxxxxxxx>
---

Reproduced on rpi-3-b 32-bit kernel with multi_v7_defconfig and all
vchiq configs options. Patch seems to fix the error mentioned in the
commit message.
i tested the patch on my Raspberry Pi 3B+ with multi_v7_defconfig and

CONFIG_BCM_VIDEOCORE=y
CONFIG_BCM2835_VCHIQ=m
CONFIG_VCHIQ_CDEV=y

Now X came up, but if i run

modprobe -r vchiq

but i'm still getting

[  146.730014] bcm2835_vchiq 3f00b840.mailbox: sync: error: 0: sf
unexpected msgid 0@10b01b8d,0

so it seems to me stopping those kthreads isn't that trivial (which i
never expected). Maybe we need to care about the order or an even more
complex synchronization mechanism between VideoCore firmware and the
kthreads.
Dave, Phil - would you happen to have any opinions on this behavior ?
I won't have the time for further investigations during the eastern
holidays. Maybe we should revert d9c60badccc1 and take the necessary
time for proper testing.

It's hard to be sure, but I would guess that stopping the thread is
causing the remote_event_wait in sync_func to terminate early. The
following code assumes that the message is valid, when it fact it will
have a message type of VCHIQ_MSG_PADDING.

If that's the case, you can either find some way to spot that the wait
has actually failed, or handle VCHIQ_MSG_PADDING explicitly in the
switch.
Thanks. It seems the return value of remote_event_wait() is ignored here
and should be used.

Phil






[Index of Archives]     [Linux Driver Development]     [Linux Driver Backports]     [DMA Engine]     [Linux GPIO]     [Linux SPI]     [Video for Linux]     [Linux USB Devel]     [Linux Coverity]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Yosemite Backpacking]
  Powered by Linux