Re: [PATCH] staging: vc04_services: Stop kthreads on .remove

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Umang,

On Thu, 28 Mar 2024 at 16:54, Umang Jain <umang.jain@xxxxxxxxxxxxxxxx> wrote:
>
>
>
> On 28/03/24 10:19 pm, Stefan Wahren wrote:
> > Hi Umang,
> >
> > Am 28.03.24 um 06:16 schrieb Umang Jain:
> >> It turns out that stopping kthreads on vchiq_shutdown_internal()
> >> corrupts the vchiq firmware during probe.
> >>
> >> [   11.005324] bcm2835_vchiq 3f00b840.mailbox: sync: error: 0: sf
> >> unexpected msgid 0@b18db30a,0
> >>
> >> Since the kthreads are created during vchiq_probe(), symmetrically
> >> they should be stopped on vchiq_remove().
> >>
> >> Also, vchiq_shutdown_internal() is called by vchiq_shutdown() which
> >> is a exported function. Hence, modules can take indirectly shut the
> >> vchiq interface by stopping the kthreads.
> >>
> >> Fix it by stopping the kthreads in .remove instead.
> >>
> >> Fixes: d9c60badccc1 ("staging: vc04_services: vchiq_core: Stop
> >> kthreads on shutdown")
> >> Reported-by: Stefan Wahren <wahrenst@xxxxxxx>
> >> Signed-off-by: Umang Jain <umang.jain@xxxxxxxxxxxxxxxx>
> >> ---
> >>
> >> Reproduced on rpi-3-b 32-bit kernel with multi_v7_defconfig and all
> >> vchiq configs options. Patch seems to fix the error mentioned in the
> >> commit message.
> > i tested the patch on my Raspberry Pi 3B+ with multi_v7_defconfig and
> >
> > CONFIG_BCM_VIDEOCORE=y
> > CONFIG_BCM2835_VCHIQ=m
> > CONFIG_VCHIQ_CDEV=y
> >
> > Now X came up, but if i run
> >
> > modprobe -r vchiq
> >
> > but i'm still getting
> >
> > [  146.730014] bcm2835_vchiq 3f00b840.mailbox: sync: error: 0: sf
> > unexpected msgid 0@10b01b8d,0
> >
> > so it seems to me stopping those kthreads isn't that trivial (which i
> > never expected). Maybe we need to care about the order or an even more
> > complex synchronization mechanism between VideoCore firmware and the
> > kthreads.
>
> Dave, Phil - would you happen to have any opinions on this behavior ?
> >
> > I won't have the time for further investigations during the eastern
> > holidays. Maybe we should revert d9c60badccc1 and take the necessary
> > time for proper testing.
> >

It's hard to be sure, but I would guess that stopping the thread is
causing the remote_event_wait in sync_func to terminate early. The
following code assumes that the message is valid, when it fact it will
have a message type of VCHIQ_MSG_PADDING.

If that's the case, you can either find some way to spot that the wait
has actually failed, or handle VCHIQ_MSG_PADDING explicitly in the
switch.

Phil




[Index of Archives]     [Linux Driver Development]     [Linux Driver Backports]     [DMA Engine]     [Linux GPIO]     [Linux SPI]     [Video for Linux]     [Linux USB Devel]     [Linux Coverity]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Yosemite Backpacking]
  Powered by Linux