On 03/11/2023 11:44 AM, Jonas Gorski wrote:
Depending on the bootloader but likely bootloader does not use RAC at all. So agree that RAC may not be properly initialized when the flush function is called and push the stale data to corrupt memory and cause problem later on the userspace.On Sat, 11 Mar 2023 at 18:32, Florian Fainelli <f.fainelli@xxxxxxxxx> wrote:On 3/10/2023 4:13 AM, Álvaro Fernández Rojas wrote:arch_sync_dma_for_cpu_all() causes kernel panics on BCM6358 with EHCI/OHCI: [ 3.881739] usb 1-1: new high-speed USB device number 2 using ehci-platform [ 3.895011] Reserved instruction in kernel code[#1]: [ 3.900113] CPU: 0 PID: 1 Comm: init Not tainted 5.10.16 #0 [ 3.905829] $ 0 : 00000000 10008700 00000000 77d94060 [ 3.911238] $ 4 : 7fd1f088 00000000 81431cac 81431ca0 [ 3.916641] $ 8 : 00000000 ffffefff 8075cd34 00000000 [ 3.922043] $12 : 806f8d40 f3e812b7 00000000 000d9aaa [ 3.927446] $16 : 7fd1f068 7fd1f080 7ff559b8 81428470 [ 3.932848] $20 : 00000000 00000000 55590000 77d70000 [ 3.938251] $24 : 00000018 00000010 [ 3.943655] $28 : 81430000 81431e60 81431f28 800157fc [ 3.949058] Hi : 00000000 [ 3.952013] Lo : 00000000 [ 3.955019] epc : 80015808 setup_sigcontext+0x54/0x24c [ 3.960464] ra : 800157fc setup_sigcontext+0x48/0x24c [ 3.965913] Status: 10008703 KERNEL EXL IE [ 3.970216] Cause : 00800028 (ExcCode 0a) [ 3.974340] PrId : 0002a010 (Broadcom BMIPS4350) [ 3.979170] Modules linked in: ohci_platform ohci_hcd fsl_mph_dr_of ehci_platform ehci_fsl ehci_hcd gpio_button_hotplug usbcore nls_base usb_common [ 3.992907] Process init (pid: 1, threadinfo=(ptrval), task=(ptrval), tls=77e22ec8) [ 4.000776] Stack : 81431ef4 7fd1f080 81431f28 81428470 7fd1f068 81431edc 7ff559b8 81428470 [ 4.009467] 81431f28 7fd1f080 55590000 77d70000 77d5498c 80015c70 806f0000 8063ae74 [ 4.018149] 08100002 81431f28 0000000a 08100002 81431f28 0000000a 77d6b418 00000003 [ 4.026831] ffffffff 80016414 80080734 81431ecc 81431ecc 00000001 00000000 04000000 [ 4.035512] 77d54874 00000000 00000000 00000000 00000000 00000012 00000002 00000000 [ 4.044196] ... [ 4.046706] Call Trace: [ 4.049238] [<80015808>] setup_sigcontext+0x54/0x24c [ 4.054356] [<80015c70>] setup_frame+0xdc/0x124 [ 4.059015] [<80016414>] do_notify_resume+0x1dc/0x288 [ 4.064207] [<80011b50>] work_notifysig+0x10/0x18 [ 4.069036] [ 4.070538] Code: 8fc300b4 00001025 26240008 <ac820000> ac830004 3c048063 0c0228aa 24846a00 26240010 [ 4.080686] [ 4.082517] ---[ end trace 22a8edb41f5f983b ]--- [ 4.087374] Kernel panic - not syncing: Fatal exception [ 4.092753] Rebooting in 1 seconds..Did you pinpoint which specific instruction within arch_sync_dma_for_cpu_all() is causing the reserved instruction exception?It's setup_sigcontext(), not arch_sync_dma_for_cpu_all() that's causing the exception ;-) Hand decoding the Code gives me lw $1, 0xb4($fp) or $v0, 0, 0 addiu $a0, $s1, 8 sw $v0, 0($a0) <- the code in brackets, so I guess EPC? sw $v1, 4($a0) which I assume is this part: err |= __put_user(regs->cp0_epc, &sc->sc_pc); (0xb4 is the offset of cp0_epc, 0x8 the offset of sc_pc) One thing I see is that we do the RAC flush for BMIPS3300, 4350 and 4380, but only initialize it for 3300 [1], but leave it at whatever state the bootloader did for the other ones. Maybe it has some invalid config in (that particuar?) 6358 that triggers issues later on after a flush? E.g. the flush puts it in an error state, and the next time something triggers a prefetch(write?) (by trying to access userspace) it generates an error exception.
Just spit balling though. [1] https://elixir.bootlin.com/linux/latest/source/arch/mips/kernel/smp-bmips.c#L587 Jonas
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature