Re: [PROBLEM] spi driver internal error during boot on sparx5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Casper,

On Fri, 26 Aug 2022 11:41:43 +0200
Casper Andersson <casper.casan@xxxxxxxxx> wrote:

> Hi,
> 
> I'm having some issues on my SparX5 switch (PCB135) from Microchip.
> Since this patch series by David Jander the spi driver errors during boot.
> https://lore.kernel.org/all/20220621061234.3626638-1-david@xxxxxxxxxxx/
> 
> ae7d2346dc89 ("spi: Don't use the message queue if possible in spi_sync")
> On this commit it starts failing to mount the partitions during boot.
> This causes the output marked ERROR 1 below.
> 
> 69fa95905d40 ("spi: Ensure the io_mutex is held until spi_finalize_current_message()")
> On this commit it no longer boots properly. I am able to enter login
> info, but then I'm unable to do anything. Though, when running latest
> versions of e.g. net and net-next trees it boots sometimes. I have
> observed some different errors which seems to occur seemingly at random,
> show in ERROR 2-4 below. ERROR 2 and 3 seems to be the most common ones.
> 
> ERROR 1:
> [    1.333629] Internal error: Oops: 96000044 [#1] PREEMPT SMP
> [    1.333636] Modules linked in:
> [    1.333644] CPU: 0 PID: 292 Comm: spi0 Not tainted 5.19.0-rc1 #18
> [    1.333653] Hardware name: microchip,sparx5 (DT)
> [    1.333657] pstate: a0000005 (NzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [    1.333665] pc : ktime_get_real_ts64+0x4c/0x110
> [    1.333681] lr : spi_transfer_one_message+0x13c/0x690
> [    1.333693] sp : ffff80000aadbce0
> [    1.333696] x29: ffff80000aadbce0 x28: 0000000000000000 x27: 00007bff79822ce0
> [    1.333709] x26: ffff000004d1f400 x25: ffff000004d1f000 x24: ffff800009d03798
> [    1.333720] x23: ffff8000093f3008 x22: 0000006809d03830 x21: 00000000000001c2
> [    1.333732] x20: ffff800009a20e80 x19: 0000000000000000 x18: ffffffffffffffff
> [    1.333743] x17: 0000000000000002 x16: 000001d25a50f10c x15: 00074f2fcdde90a0
> [    1.333754] x14: ffff800008dfb8a0 x13: 000000000000ac08 x12: 00000000f5257d14
> [    1.333766] x11: 00000000000002f7 x10: 0000000000000a00 x9 : ffff8000088690bc
> [    1.333777] x8 : ffff000005242760 x7 : 0000000000000004 x6 : fffffbffeffcbd28
> [    1.333788] x5 : 0000000000000000 x4 : 0000000009a51c00 x3 : 0000000000000000
> [    1.333798] x2 : 0000000009a51c00 x1 : 0000000000000000 x0 : 0000000000000001
> [    1.333810] Call trace:
> [    1.333812]  ktime_get_real_ts64+0x4c/0x110
> [    1.333821]  spi_transfer_one_message+0x13c/0x690
> [    1.333831]  __spi_pump_transfer_message+0x174/0x550
> [    1.333841]  __spi_pump_messages+0xb8/0x330
> [    1.333850]  spi_pump_messages+0x24/0x30
> [    1.333859]  kthread_worker_fn+0xb8/0x290
> [    1.333870]  kthread+0x118/0x120
> [    1.333879]  ret_from_fork+0x10/0x20
> [    1.333892] Code: 120002b3 370004d5 d50339bf f9403e80 (f90002c0)
> [    1.333898] ---[ end trace 0000000000000000 ]---
> 
> Error 2:
> [    5.527818] Internal error: Oops: 8600000f [#1] PREEMPT SMP
> [    5.534020] Modules linked in:
> [    5.536959] CPU: 0 PID: 292 Comm: spi0 Not tainted 6.0.0-rc1 #7
> [    5.542627] Hardware name: microchip,sparx5 (DT)
> [    5.547043] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [    5.553704] pc : 0xffff0000042062d8
> [    5.557046] lr : spi_finalize_current_message+0x1ac/0x2e0
> [    5.562216] sp : ffff80000aa2bc50
> [    5.565384] x29: ffff80000aa2bc50 x28: 0000000000000000 x27: 00007bff79858be0
> [    5.572217] x26: ffff000007258c20 x25: ffff000007258800 x24: ffff80000cb8b0e0
> [    5.579049] x23: ffff80000cb8b1b0 x22: ffff80000cb8b158 x21: ffff000004d1c800
> [    5.585881] x20: ffff80000cb8b1b0 x19: ffff80000cb8b1b0 x18: 0000000000000000
> [    5.592713] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> [    5.599544] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
> [    5.606375] x11: 00000000000001f5 x10: 0000000000000a10 x9 : ffff80000808e890
> [    5.613206] x8 : ffff000004966170 x7 : 0000000000000004 x6 : 00000001497a5800
> [    5.620037] x5 : 00000000410fd030 x4 : 0000000000c0000e x3 : ffff800076773000
> [    5.626869] x2 : 0000000000000000 x1 : ffff0000042062d8 x0 : ffff0000037addb8
> [    5.633701] Call trace:
> [    5.636040]  0xffff0000042062d8
> [    5.639046]  spi_mux_complete_cb+0x48/0x60
> [    5.642969]  spi_finalize_current_message+0x1ac/0x2e0
> [    5.647803]  spi_transfer_one_message+0x298/0x680
> [    5.652304]  __spi_pump_transfer_message+0x188/0x5a0
> [    5.657055]  __spi_pump_messages+0xdc/0x330
> [    5.661058]  spi_pump_messages+0x24/0x30
> [    5.664812]  kthread_worker_fn+0xb8/0x290
> [    5.668653]  kthread+0x118/0x120
> [    5.671742]  ret_from_fork+0x10/0x20
> [    5.675170] Code: 00000000 00000000 08f4b570 ffff8000 (00000000)
> [    5.680999] ---[ end trace 0000000000000000 ]---
> [    5.678207] note: spi0[291] exited with preempt_count 1
> 
> ERROR 3:
> [    4.443467] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000108
> [    4.443479] Mem abort info:
> [    4.443481]   ESR = 0x0000000086000004
> [    4.443485]   EC = 0x21: IABT (current EL), IL = 32 bits
> [    4.443490]   SET = 0, FnV = 0
> [    4.443494]   EA = 0, S1PTW = 0
> [    4.443497]   FSC = 0x04: level 0 translation fault
> [    4.443502] user pgtable: 4k pages, 48-bit VAs, pgdp=00000007070f2000
> [    4.443508] [0000000000000108] pgd=0000000000000000, p4d=0000000000000000
> [    4.443520] Internal error: Oops: 86000004 [#1] PREEMPT SMP
> [    4.443527] Modules linked in:
> [    4.443534] CPU: 0 PID: 292 Comm: spi0 Not tainted 5.19.0-rc1 #25
> [    4.443542] Hardware name: microchip,sparx5 (DT)
> [    4.443546] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [    4.443555] pc : 0x108
> [    4.443564] lr : spi_finalize_current_message+0x1ac/0x2e0
> [    4.443578] sp : ffff80000ab03c50
> [    4.443581] x29: ffff80000ab03c50 x28: 0000000000000000 x27: 00007bff79822ce0
> [    4.443594] x26: ffff0000049c3420 x25: ffff0000049c3000 x24: ffff80000b7733b0
> [    4.443607] x23: ffff80000b773480 x22: ffff80000b773428 x21: ffff0000036ac800
> [    4.443619] x20: ffff80000b773480 x19: ffff80000b773480 x18: 0000000040bd0097
> [    4.443632] x17: 0000000000000004 x16: 0000000000000121 x15: 007c2a66ba4f78c6
> [    4.443644] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000002
> [    4.443655] x11: 000000000000026b x10: 0000000000000a00 x9 : ffff80000808c94c
> [    4.443667] x8 : ffff00000735c460 x7 : 0000000000000001 x6 : 0000000108d9d000
> [    4.443679] x5 : 00000000410fd030 x4 : 0000000000c0000e x3 : ffff8000767a9000
> [    4.443691] x2 : 0000000000000000 x1 : 0000000000000108 x0 : ffff80000b773750
> [    4.443703] Call trace:
> [    4.443706]  0x108
> [    4.443712]  spi_mux_complete_cb+0x48/0x60
> [    4.443720]  spi_finalize_current_message+0x1ac/0x2e0
> [    4.443730]  spi_transfer_one_message+0x2b0/0x690
> [    4.443739]  __spi_pump_transfer_message+0x188/0x5a0
> [    4.443749]  __spi_pump_messages+0xdc/0x330
> [    4.443759]  spi_pump_messages+0x24/0x30
> [    4.443768]  kthread_worker_fn+0xb8/0x290
> [    4.443779]  kthread+0x118/0x120
> [    4.443788]  ret_from_fork+0x10/0x20
> [    4.443802] Code: bad PC value
> [    4.443807] ---[ end trace 0000000000000000 ]---
> 
> Error 4:
> [    4.012013] Unable to handle kernel execute from non-executable memory at virtual address ffff80000b79b498
> [    4.012027] Mem abort info:
> [    4.012029]   ESR = 0x000000008600000f
> [    4.012033]   EC = 0x21: IABT (current EL), IL = 32 bits
> [    4.012038]   SET = 0, FnV = 0
> [    4.012042]   EA = 0, S1PTW = 0
> [    4.012045]   FSC = 0x0f: level 3 permission fault
> [    4.012049] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000007012ec000
> [    4.012055] [ffff80000b79b498] pgd=100000077ffff003, p4d=100000077ffff003, pud=100000077fffe003, pmd=10000007036c7003, pte=0068000704f29703
> [    4.012077] Internal error: Oops: 8600000f [#1] PREEMPT SMP
> [    4.012084] Modules linked in:
> [    4.012091] CPU: 0 PID: 292 Comm: spi0 Not tainted 5.19.0-rc1 #25
> [    4.012099] Hardware name: microchip,sparx5 (DT)
> [    4.012103] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [    4.012112] pc : 0xffff80000b79b498
> [    4.012121] lr : 0xffff80000b79b498
> [    4.012127] sp : ffff80000ab5bc90
> [    4.012130] x29: ffff0000042ea800 x28: 0000000000000000 x27: 00007bff79822ce0
> [    4.012143] x26: ffff0000042eac20 x25: ffff0000042ea800 x24: ffff80000b79b420
> [    4.012155] x23: ffff80000b79b4f0 x22: ffff80000b79b498 x21: ffff000004c35800
> [    4.012168] x20: ffff80000b79b4f0 x19: ffff80000b79b4f0 x18: 0000000000000000
> [    4.012180] x17: 0000000000000001 x16: 0000000000000001 x15: 0117d93fc9cfcb72
> [    4.012192] x14: 0114def7e22c5168 x13: ffff800008dfb8a0 x12: 00000000fa83b2da
> [    4.012204] x11: 0000000000000077 x10: 000000000000008a x9 : ffff8000080b2724
> [    4.012216] x8 : 0000000000000000 x7 : 0000000000000001 x6 : 0000000000000096
> [    4.012227] x5 : 0000000000000000 x4 : ffff00007fbacd80 x3 : ffff000004dec880
> [    4.012239] x2 : 0000000000000000 x1 : ffff000004dec880 x0 : 0000000000000000
> [    4.012251] Call trace:
> [    4.012254]  0xffff80000b79b498
> [    4.012265] Code: 04dfc880 ffff0000 0b79b4b0 ffff8000 (08dd5984)
> [    4.012271] ---[ end trace 0000000000000000 ]---

Thanks for reporting.
Looking at Errors 2 and 3, I suspect there might be a race in the SPI mux
driver. After a quick inspection, I see this:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/spi/spi-mux.c?h=v6.0-rc3#n123

spi_mux_transfer_one_message() returns before the message is transferred (in
spi_async()), which is not expected. AFAIK, an ctlr->transfer_one_message()
implementation should not return until the transfer is completed.

To check if that is causing the problem, could you try the following change:

--- a/drivers/spi/spi-mux.c
+++ b/drivers/spi/spi-mux.c
@@ -120,7 +120,7 @@ static int spi_mux_transfer_one_message(struct spi_controller *ctlr,
        m->spi = priv->spi;
 
        /* do the transfer */
-       return spi_async(priv->spi, m);
+       return spi_sync(priv->spi, m);
 }
 
 static int spi_mux_probe(struct spi_device *spi)

Not sure if this is a correct fix, but I'd like to know if your situation
changes this way, if you could try it.
I don't have access to any hardware with a mux unfortunately, so I can't test
it myself.

Best regards,

-- 
David Jander
Protonic Holland.



[Index of Archives]     [Linux Kernel]     [Linux ARM (vger)]     [Linux ARM MSM]     [Linux Omap]     [Linux Arm]     [Linux Tegra]     [Fedora ARM]     [Linux for Samsung SOC]     [eCos]     [Linux Fastboot]     [Gcc Help]     [Git]     [DCCP]     [IETF Announce]     [Security]     [Linux MIPS]     [Yosemite Campsites]

  Powered by Linux