Re: x15: Unable to handle kernel NULL pointer dereference at virtual address 00000004 when read : pci_generic_config_read

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 26 Jul 2023 at 16:04, Arnd Bergmann <arnd@xxxxxxxx> wrote:
>
> On Wed, Jul 26, 2023, at 11:59, Naresh Kamboju wrote:
> > On Tue, 20 Jun 2023 at 14:10, Arnd Bergmann <arnd@xxxxxxxx> wrote:
> >>
> >> On Tue, Jun 20, 2023, at 10:00, Naresh Kamboju wrote:
> >> > We have been noticing the following kernel crash on x15 device while running
> >> > LTP fs proc01 testing with Linux stable rc 6.x kernels.
> >>
> >> Do you know if this is a regression with this kernel version compared
> >> to older kernels running the same tests, or an added testcase in LTP
> >> that exercises a code path that may have been broken for longer?
> ...
> >>
> >> I have not disassembled the vmlinux file, but I can see that the
> >> offset into the NULL pointer is '4', which does not match the
> >> structur offsets for bus->ops or ops->map_bus.
> >>
> >> I also see that if map_bus returns NULL, we treat that as
> >> an error, but if it returns '4', that is taken as a pointer,
> >> which is my best guess at what is happening here.
> >>
> >> map_bus() seems to be either dw_pcie_other_conf_map_bus() or
> >> dw_pcie_own_conf_map_bus(), since the dra7 does not have its
> >> own variant but inherits these from the dwc pci driver.
> >>
> >> I think this is caused by the combination of two bugs:
> >>
> >> - something prevents the dra7-pcie driver from probing the
> >>   device correctly, ultimately failing with the "failed to
> >>   request irq" message.
> >>
> >> - The error handling in dra7xx_pcie_probe() fails to clean
> >>   up after the first problem, leaving the PCIe host
> >>   in a broken state instead of removing it entirely.
> >
> > The reported kernel crash is continuously happening on the
> > BeagleBoard x15 device while running LTP fs tests on stable rc 6.4.7-rc1.
>
> Ok, so you think there is an additional regression between
> 6.4.6 and 6.4.7-rc1? on top of the two that you have not bisected?

Sure.
We need to find out more details like when it got started
and need to check this crash on the mainline kernel and
stable rc branches.


>
> I don't see any changes in drivers/pci/ after 6.4.5, so I'm
> even more confused now.
>
> > soundcore display_connector
> > [ 1195.601104] CPU: 0 PID: 4876 Comm: proc01 Not tainted 6.4.7-rc1 #1
> > [ 1195.607330] Hardware name: Generic DRA74X (Flattened Device Tree)
> > [ 1195.613464] PC is at pci_generic_config_read+0x34/0x8c
> > [ 1195.618621] LR is at pci_generic_config_read+0x1c/0x8c
>
> This looks identical to the first bugs that you reported, so I'd
> suggest you keep trying to narrow down when that one started rather
> than looking at the latest stable-rc.

Thanks for the suggestions.

>
>      Arnd

-Naresh



[Index of Archives]     [Linux Arm (vger)]     [ARM Kernel]     [ARM MSM]     [Linux Tegra]     [Linux WPAN Networking]     [Linux Wireless Networking]     [Maemo Users]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux