Re: renesas_sdhi problems in 5.10-stable was Re: [PATCH 5.10 000/226] 5.10.198-rc1 review

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/25/23 10:05, Geert Uytterhoeven wrote:
On Wed, Oct 25, 2023 at 2:35 PM Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote:
On Wed, Oct 25, 2023 at 12:53 PM Geert Uytterhoeven
<geert@xxxxxxxxxxxxxx> wrote:
On Wed, Oct 25, 2023 at 12:47 PM Geert Uytterhoeven
<geert@xxxxxxxxxxxxxx> wrote:
On Tue, Oct 24, 2023 at 9:22 PM Pavel Machek <pavel@xxxxxxx> wrote:
But we still have failures on Renesas with 5.10.199-rc2:

https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/pipelines/1047368849

And they still happed during MMC init:

     2.638013] renesas_sdhi_internal_dmac ee100000.mmc: Got CD GPIO
[    2.638846] INFO: trying to register non-static key.
[    2.644192] ledtrig-cpu: registered to indicate activity on CPUs
[    2.649066] The code is fine but needs lockdep annotation, or maybe
[    2.649069] you didn't initialize this object before use?
[    2.649071] turning off the locking correctness validator.
[    2.649080] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.199-rc2-arm64-renesas-ge31b6513c43d #1
[    2.649082] Hardware name: HopeRun HiHope RZ/G2M with sub board (DT)
[    2.649086] Call trace:
[    2.655106] SMCCC: SOC_ID: ARCH_SOC_ID not implemented, skipping ....
[    2.661354]  dump_backtrace+0x0/0x194
[    2.661361]  show_stack+0x14/0x20
[    2.667430] usbcore: registered new interface driver usbhid
[    2.672230]  dump_stack+0xe8/0x130
[    2.672238]  register_lock_class+0x480/0x514
[    2.672244]  __lock_acquire+0x74/0x20ec
[    2.681113] usbhid: USB HID core driver
[    2.687450]  lock_acquire+0x218/0x350
[    2.687456]  _raw_spin_lock+0x58/0x80
[    2.687464]  tmio_mmc_irq+0x410/0x9ac
[    2.688556] renesas_sdhi_internal_dmac ee160000.mmc: mmc0 base at 0x00000000ee160000, max clock rate 200 MHz
[    2.744936]  __handle_irq_event_percpu+0xbc/0x340
[    2.749635]  handle_irq_event+0x60/0x100
[    2.753553]  handle_fasteoi_irq+0xa0/0x1ec
[    2.757644]  __handle_domain_irq+0x7c/0xdc
[    2.761736]  efi_header_end+0x4c/0xd0
[    2.765393]  el1_irq+0xcc/0x180
[    2.768530]  arch_cpu_idle+0x14/0x2c
[    2.772100]  default_idle_call+0x58/0xe4
[    2.776019]  do_idle+0x244/0x2c0
[    2.779242]  cpu_startup_entry+0x20/0x6c
[    2.783160]  rest_init+0x164/0x28c
[    2.786561]  arch_call_rest_init+0xc/0x14
[    2.790565]  start_kernel+0x4c4/0x4f8
[    2.794233] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000014
[    2.803011] Mem abort info:

from https://lava.ciplatform.org/scheduler/job/1025535
from
https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/jobs/5360973735 .

Is there something else missing?

I don't have a HopeRun HiHope RZ/G2M, but both v5.10.198 and v5.10.199
seem to work fine on Salvator-XS with R-Car H3 ES2.0 and Salvator-X
with R-Car M3-W ES1.0, using a config based on latest renesas_defconfig.

Sorry, I looked at the wrong log on R-Car M3-W.
I do see the issue with v5.10.198, but not with v5.10.199.

It seems to be an intermittent issue. Investigating...

After spending too much time on bisecting, the bad guy turns out to
be commit 6d3745bbc3341d3b ("mmc: renesas_sdhi: register irqs before
registering controller") in v5.10.198.

Adding debug information shows the lock is mmc_host.lock.

It is definitely initialized:

     renesas_sdhi_probe()
     {
         ...
         tmio_mmc_host_alloc()
             mmc_alloc_host
                 spin_lock_init(&host->lock);
         ...
         devm_request_irq()
         -> tmio_mmc_irq
             tmio_mmc_cmd_irq()
                 spin_lock(&host->lock);
         ...
     }

That leaves us with a missing lockdep annotation?


Is it possible that the lock initialization is overwritten ?
I seem to recall a recent case where this happens.

Also, there is
	spin_lock_init(&_host->lock);
in tmio_mmc_host_probe(), and tmio_mmc_host_probe() is called after
devm_request_irq().

Also, how would lockdep annotation help with "Unable to handle
kernel NULL pointer dereference at virtual address 0000000000000014"
in the log above ?

Guenter




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux