Re: am335x: 5.18.x: system stalling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 7, 2022 at 10:55 AM Yegor Yefremov
<yegorslists@xxxxxxxxxxxxxx> wrote:
> On Sun, Jun 5, 2022 at 4:59 PM Ard Biesheuvel <ardb@xxxxxxxxxx> wrote:
> > On Fri, 3 Jun 2022 at 22:47, Arnd Bergmann <arnd@xxxxxxxx> wrote:
> > > On Fri, Jun 3, 2022 at 9:11 PM Yegor Yefremov <yegorslists@xxxxxxxxxxxxxx> wrote:
> > > >
> > > > With compiled-in drivers the system doesn't stall. All other tests and
> > > > related outputs will come next week.
> > >
> > > Ah, nice!
> > >
> > > It's probably a reasonable assumption that the smp-patched get_current()
> > > is (at least sometimes) broken in modules but working in the kernel itself.
> > > I suppose that means in the worst case we can hot-fix the issue by
> > > having an 'extern' version of get_current() for the case of
> > > armv6+smp+module ;-)
> > >
> >
> > I've coded something up along those lines, and pushed it to my
> > am335x-stall-test branch.
> >
> > > Maybe start with the ".long 0xe7f001f2" hack I suggested in my last
> > > mail. If that gives you an oops for the module case, then we know
> > > that the patching doesn't work at all and you don't have to try anything
> > > else, otherwise it's more likely that an incorrect instruction sequence
> > > is patched in.
> > >
> >
> > Yeah, I'd be really surprised if the patching misses some occurrences,
> > so I have no clue what is going on here.
> >
> > Yegor, can you please try my branch with the original config (i.e.,
> > slcan and ftdio as modules)
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=am335x-stall-test
>
> @Arnd: I have applied your patch with this change:
>
> asm("0: .long 0xe7f001f2                        \n\t" // BUG() trap
>
> But it revealed nothing new:
>
> [   50.754130] rcu: INFO: rcu_sched self-detected stall on CPU
>
> @Ard: I have tried your branch
> (21b6671c82d4df52ea0c7837705331acb375c5c8). The system still stalls.

Getting back to this old thread, as we never found out what is
actually going on.

It seems we are still stuck trying to figure out why a kernel with ARMv6
support and SMP patching is broken, or if the same bug might also affect
other configurations without ARMv6 support. This is of course very
unfortunate, but unless someone has an idea for how to debug the problem
further, I suppose we should at least prevent that broken configuration and
disallow enabling CONFIG_SMP in combination with ARMv6 (pre-ARMv6K)
CPUs, to keep others from running into the same problem.

Any other suggestions?

        Arnd



[Index of Archives]     [Linux Arm (vger)]     [ARM Kernel]     [ARM MSM]     [Linux Tegra]     [Linux WPAN Networking]     [Linux Wireless Networking]     [Maemo Users]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux