On 10/12/2018 14:34, Robin Murphy wrote: > On 10/12/2018 12:37, Marc Gonzalez wrote: > >> When the kernel fails to init the UFSHC, it calls ufshcd_print_host_regs() >> to help with debugging, which calls the dbg_register_dump hook. >> >> ufs_qcom_dump_dbg_regs makes the kernel panic: >> >> [ 3.715634] UFS_DBG_RD_REG_TXUC 000000a0: 00000000 00000000 00000000 00000000 >> [ 3.722750] UFS_DBG_RD_REG_TXUC 000000b0: 00000001 00000000 00000000 00000004 >> [ 3.729943] Internal error: synchronous external abort: 96000210 [#1] PREEMPT SMP >> [ 3.737000] Modules linked in: >> [ 3.744371] CPU: 2 PID: 1 Comm: swapper/0 Tainted: G S 4.20.0-rc4 #16 >> [ 3.747413] Hardware name: Qualcomm Technologies, Inc. MSM8998 v1 MTP (DT) >> [ 3.755295] pstate: 00000005 (nzcv daif -PAN -UAO) >> [ 3.761978] pc : __memcpy_fromio+0x68/0x80 >> [ 3.766718] lr : ufshcd_dump_regs+0x50/0xb0 >> [ 3.770767] sp : ffff00000807ba00 >> [ 3.774830] x29: ffff00000807ba00 x28: 00000000fffffffb >> [ 3.778344] x27: ffff0000089db068 x26: ffff8000f6e58000 >> [ 3.783728] x25: 000000000000000e x24: 0000000000000800 >> [ 3.789023] x23: ffff8000f6e587c8 x22: 0000000000000800 >> [ 3.794319] x21: ffff000008908368 x20: ffff8000f6e1ab80 >> [ 3.799615] x19: 000000000000006c x18: ffffffffffffffff >> [ 3.804910] x17: 0000000000000000 x16: 0000000000000000 >> [ 3.810206] x15: ffff000009199648 x14: ffff000089244187 >> [ 3.815502] x13: ffff000009244195 x12: ffff0000091ab000 >> [ 3.820797] x11: 0000000005f5e0ff x10: ffff0000091998a0 >> [ 3.826093] x9 : 0000000000000000 x8 : ffff8000f6e1ac00 >> [ 3.831389] x7 : 0000000000000000 x6 : 0000000000000068 >> [ 3.836676] x5 : ffff8000f6e1abe8 x4 : 0000000000000000 >> [ 3.841971] x3 : ffff00000928c868 x2 : ffff8000f6e1abec >> [ 3.847267] x1 : ffff00000928c868 x0 : ffff8000f6e1abe8 >> [ 3.852567] Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____)) >> [ 3.857900] Call trace: >> [ 3.864473] __memcpy_fromio+0x68/0x80 >> [ 3.866683] ufs_qcom_dump_dbg_regs+0x1c0/0x370 >> [ 3.870522] ufshcd_print_host_regs+0x168/0x190 >> [ 3.874946] ufshcd_init+0xd4c/0xde0 >> [ 3.879459] ufshcd_pltfrm_init+0x3c8/0x550 >> [ 3.883264] ufs_qcom_probe+0x24/0x60 >> [ 3.887188] platform_drv_probe+0x50/0xa0 >> [ 3.890993] really_probe+0x1f0/0x2a0 >> [ 3.894983] driver_probe_device+0x58/0x100 >> [ 3.898628] __driver_attach+0xd4/0xe0 >> [ 3.902617] bus_for_each_dev+0x74/0xd0 >> [ 3.906436] driver_attach+0x20/0x30 >> [ 3.910169] bus_add_driver+0x1ac/0x220 >> [ 3.913992] driver_register+0x60/0x110 >> [ 3.917540] __platform_driver_register+0x40/0x50 >> [ 3.921413] ufs_qcom_pltform_init+0x18/0x20 >> [ 3.926248] do_one_initcall+0x5c/0x180 >> [ 3.930593] kernel_init_freeable+0x198/0x244 >> [ 3.934156] kernel_init+0x10/0x110 >> [ 3.938629] ret_from_fork+0x10/0x20 >> [ 3.941940] Code: f2400842 54000100 8b020002 d503201f (39400023) >> [ 3.945875] ---[ end trace 2d10f654364744f5 ]--- >> [ 3.951841] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b >> [ 3.956528] SMP: stopping secondary CPUs >> [ 5.005502] SMP: failed to stop secondary CPUs 2,7 >> [ 5.005648] Kernel Offset: disabled >> [ 5.009292] CPU features: 0x2,21802008 >> [ 5.012676] Memory Limit: none >> [ 5.016485] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]--- >> >> >> The problem appears to be on this line: >> >> reg = ufs_qcom_get_debug_reg_offset(host, UFS_DBG_RD_REG_RXUC); >> /* reg = 0x800 */ >> print_fn(hba, reg, 27, "UFS_DBG_RD_REG_RXUC ", priv); >> >> I'm not sure what's going on, because the driver is supposed to map 0x2500 bytes. >> (reg = <0x1da4000 0x2500>;) > > It is mapped - you're not getting an MMU fault but an external abort, > which means the access has been translated and gone out, but the > peripheral (or possibly the interconnect in between) didn't like it and > sent some kind of decode error back. Given that you're faulting in > memcpy_from_io() that's not too surprising - using that on actual device > registers (rather than stuff like ioremap()ed SRAM) is generally a bad > idea, since you have no control over things like access size and > ordering that a typical device is sensitive to. Thanks Robin, your insight is always so very helpful. I do see that __memcpy_fromio reads in chunks of 8-bytes, using __raw_readq. The weird thing (to me) is that the kernel does not panic when I call the debug function a bit later (after sleeping for a second). Feels like there is a race somewhere, and the device is not happy if it accessed "too soon". The kernel still hangs though, which might be worse than a clear-cut panic... >> Commenting out the last 4 dumps of ufs_qcom_print_hw_debug_reg_all() makes >> the panic disappear, but the kernel just hangs after printing UFS_DBG_RD_REG_TXUC > > I'd recommend fixing the register dump code to use specific read*() > accesses of the appropriate sizes for each register. I don't have documentation for that memory region, but based on the source code, I would assume 32-bit registers. I'll try cooking up a small patch. Regards.