Re: ufs_qcom_dump_dbg_regs makes the kernel panic

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/12/2018 14:34, Robin Murphy wrote:

> On 10/12/2018 12:37, Marc Gonzalez wrote:
>
>> When the kernel fails to init the UFSHC, it calls ufshcd_print_host_regs()
>> to help with debugging, which calls the dbg_register_dump hook.
>>
>> ufs_qcom_dump_dbg_regs makes the kernel panic:
>>
>> [    3.715634] UFS_DBG_RD_REG_TXUC 000000a0: 00000000 00000000 00000000 00000000
>> [    3.722750] UFS_DBG_RD_REG_TXUC 000000b0: 00000001 00000000 00000000 00000004
>> [    3.729943] Internal error: synchronous external abort: 96000210 [#1] PREEMPT SMP
>> [    3.737000] Modules linked in:
>> [    3.744371] CPU: 2 PID: 1 Comm: swapper/0 Tainted: G S                4.20.0-rc4 #16
>> [    3.747413] Hardware name: Qualcomm Technologies, Inc. MSM8998 v1 MTP (DT)
>> [    3.755295] pstate: 00000005 (nzcv daif -PAN -UAO)
>> [    3.761978] pc : __memcpy_fromio+0x68/0x80
>> [    3.766718] lr : ufshcd_dump_regs+0x50/0xb0
>> [    3.770767] sp : ffff00000807ba00
>> [    3.774830] x29: ffff00000807ba00 x28: 00000000fffffffb
>> [    3.778344] x27: ffff0000089db068 x26: ffff8000f6e58000
>> [    3.783728] x25: 000000000000000e x24: 0000000000000800
>> [    3.789023] x23: ffff8000f6e587c8 x22: 0000000000000800
>> [    3.794319] x21: ffff000008908368 x20: ffff8000f6e1ab80
>> [    3.799615] x19: 000000000000006c x18: ffffffffffffffff
>> [    3.804910] x17: 0000000000000000 x16: 0000000000000000
>> [    3.810206] x15: ffff000009199648 x14: ffff000089244187
>> [    3.815502] x13: ffff000009244195 x12: ffff0000091ab000
>> [    3.820797] x11: 0000000005f5e0ff x10: ffff0000091998a0
>> [    3.826093] x9 : 0000000000000000 x8 : ffff8000f6e1ac00
>> [    3.831389] x7 : 0000000000000000 x6 : 0000000000000068
>> [    3.836676] x5 : ffff8000f6e1abe8 x4 : 0000000000000000
>> [    3.841971] x3 : ffff00000928c868 x2 : ffff8000f6e1abec
>> [    3.847267] x1 : ffff00000928c868 x0 : ffff8000f6e1abe8
>> [    3.852567] Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____))
>> [    3.857900] Call trace:
>> [    3.864473]  __memcpy_fromio+0x68/0x80
>> [    3.866683]  ufs_qcom_dump_dbg_regs+0x1c0/0x370
>> [    3.870522]  ufshcd_print_host_regs+0x168/0x190
>> [    3.874946]  ufshcd_init+0xd4c/0xde0
>> [    3.879459]  ufshcd_pltfrm_init+0x3c8/0x550
>> [    3.883264]  ufs_qcom_probe+0x24/0x60
>> [    3.887188]  platform_drv_probe+0x50/0xa0
>> [    3.890993]  really_probe+0x1f0/0x2a0
>> [    3.894983]  driver_probe_device+0x58/0x100
>> [    3.898628]  __driver_attach+0xd4/0xe0
>> [    3.902617]  bus_for_each_dev+0x74/0xd0
>> [    3.906436]  driver_attach+0x20/0x30
>> [    3.910169]  bus_add_driver+0x1ac/0x220
>> [    3.913992]  driver_register+0x60/0x110
>> [    3.917540]  __platform_driver_register+0x40/0x50
>> [    3.921413]  ufs_qcom_pltform_init+0x18/0x20
>> [    3.926248]  do_one_initcall+0x5c/0x180
>> [    3.930593]  kernel_init_freeable+0x198/0x244
>> [    3.934156]  kernel_init+0x10/0x110
>> [    3.938629]  ret_from_fork+0x10/0x20
>> [    3.941940] Code: f2400842 54000100 8b020002 d503201f (39400023)
>> [    3.945875] ---[ end trace 2d10f654364744f5 ]---
>> [    3.951841] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
>> [    3.956528] SMP: stopping secondary CPUs
>> [    5.005502] SMP: failed to stop secondary CPUs 2,7
>> [    5.005648] Kernel Offset: disabled
>> [    5.009292] CPU features: 0x2,21802008
>> [    5.012676] Memory Limit: none
>> [    5.016485] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
>>
>>
>> The problem appears to be on this line:
>>
>> 	reg = ufs_qcom_get_debug_reg_offset(host, UFS_DBG_RD_REG_RXUC);
>> 	/* reg = 0x800 */
>> 	print_fn(hba, reg, 27, "UFS_DBG_RD_REG_RXUC ", priv);
>>
>> I'm not sure what's going on, because the driver is supposed to map 0x2500 bytes.
>> (reg = <0x1da4000 0x2500>;)
> 
> It is mapped - you're not getting an MMU fault but an external abort, 
> which means the access has been translated and gone out, but the 
> peripheral (or possibly the interconnect in between) didn't like it and 
> sent some kind of decode error back. Given that you're faulting in 
> memcpy_from_io() that's not too surprising - using that on actual device 
> registers (rather than stuff like ioremap()ed SRAM) is generally a bad 
> idea, since you have no control over things like access size and 
> ordering that a typical device is sensitive to.

Thanks Robin, your insight is always so very helpful. I do see that __memcpy_fromio
reads in chunks of 8-bytes, using __raw_readq.

The weird thing (to me) is that the kernel does not panic when I call
the debug function a bit later (after sleeping for a second). Feels
like there is a race somewhere, and the device is not happy if it
accessed "too soon". The kernel still hangs though, which might be
worse than a clear-cut panic...

>> Commenting out the last 4 dumps of ufs_qcom_print_hw_debug_reg_all() makes
>> the panic disappear, but the kernel just hangs after printing UFS_DBG_RD_REG_TXUC
> 
> I'd recommend fixing the register dump code to use specific read*() 
> accesses of the appropriate sizes for each register.

I don't have documentation for that memory region, but based on the source code,
I would assume 32-bit registers. I'll try cooking up a small patch.

Regards.



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [Linux for Sparc]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux