Re: [regression] mhi: mhi_pm_st_worker blocked for more than 61 seconds.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2021-03-09 08:44 AM, Kalle Valo wrote:
Kalle Valo <kvalo@xxxxxxxxxxxxxx> writes:

Manivannan Sadhasivam <manivannan.sadhasivam@xxxxxxxxxx> writes:

Hi Kalle,

On Thu, Mar 04, 2021 at 04:59:33PM +0200, Kalle Valo wrote:
Hi MHI folks,

I upgraded my QCA6390 x86 test box to v5.12-rc1 and started seeing
kernel crashes when testing ath11k. I don't recall seeing this on v5.11 so it looks like a new problem, but I cannot be 100% sure. Netconsole output is below. I have most of the kernel debug functionality enabled
(KASAN etc).

I can fairly easy reproduce this by looping insmod and rmmod of mhi,
wireless and ath11k modules. It does not happen every time, but I would
say I can reproduce the problem within 10 test loops or so.

Any ideas what could cause this? I have not bisected this due to lack of
time, but I can test patches etc.


Not sure if this is related, Loic sent a patch which fixes an issue with
"mhi_pm_state_worker":

https://patchwork.kernel.org/project/linux-arm-msm/patch/1614161930-8513-1-git-send-email-loic.poulain@xxxxxxxxxx/

Can you please test see if it fixes your issue also?

Thanks for the link, but unfortunately not :( I was able to reproduce
the issue just after 3 insmod/rmmod loops.

I investigated this a bit more, I was actually able to reproduce this in v5.11 as well. So this is not a new regression. The reason why I started seeing this until now is that I enable more debug options in the kernel,
the diff below. Without those changes I don't see the problem.

I also found a workround, if I add sleep(1) after insmod ath11k_pci in
my test script I see 200 loops without crashes. But when I removed the
sleep the test script crashed only after 19 loops. So there definitely
is a race condition somewhere, just don't know where. I don't have time
to investigate this more, so I'll just use the workaround for the time
being.

--- ../configs/nuc-debug-5.11	2021-02-21 08:55:53.836061988 +0200
+++ .config	2021-03-09 16:22:53.598684524 +0200
@@ -12,6 +12,7 @@
 CONFIG_CC_CAN_LINK_STATIC=y
 CONFIG_CC_HAS_ASM_GOTO=y
 CONFIG_CC_HAS_ASM_INLINE=y
+CONFIG_CONSTRUCTORS=y
 CONFIG_IRQ_WORK=y
 CONFIG_BUILDTIME_TABLE_SORT=y
 CONFIG_THREAD_INFO_IN_TASK=y
@@ -280,6 +281,7 @@
 CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
 CONFIG_ZONE_DMA32=y
 CONFIG_AUDIT_ARCH=y
+CONFIG_KASAN_SHADOW_OFFSET=0xdffffc0000000000
 CONFIG_HAVE_INTEL_TXT=y
 CONFIG_X86_64_SMP=y
 CONFIG_ARCH_SUPPORTS_UPROBES=y
@@ -748,8 +750,7 @@
 # CONFIG_MODULE_SIG is not set
 # CONFIG_MODULE_COMPRESS is not set
 # CONFIG_MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS is not set
-# CONFIG_UNUSED_SYMBOLS is not set
-# CONFIG_TRIM_UNUSED_KSYMS is not set
+CONFIG_UNUSED_SYMBOLS=y
 CONFIG_MODULES_TREE_LOOKUP=y
 CONFIG_BLOCK=y
 CONFIG_BLK_SCSI_REQUEST=y
@@ -1164,7 +1165,6 @@
 # CONFIG_NET_NCSI is not set
 CONFIG_RPS=y
 CONFIG_RFS_ACCEL=y
-CONFIG_SOCK_RX_QUEUE_MAPPING=y
 CONFIG_XPS=y
 # CONFIG_CGROUP_NET_PRIO is not set
 # CONFIG_CGROUP_NET_CLASSID is not set
@@ -1685,7 +1685,6 @@
 #
 # Distributed Switch Architecture drivers
 #
-# CONFIG_NET_DSA_MV88E6XXX_PTP is not set
 # end of Distributed Switch Architecture drivers

 CONFIG_ETHERNET=y
@@ -1700,6 +1699,7 @@
 # CONFIG_NET_VENDOR_AQUANTIA is not set
 # CONFIG_NET_VENDOR_ARC is not set
 # CONFIG_NET_VENDOR_ATHEROS is not set
+# CONFIG_NET_VENDOR_AURORA is not set
 # CONFIG_NET_VENDOR_BROADCOM is not set
 CONFIG_NET_VENDOR_BROCADE=y
 # CONFIG_BNA is not set
@@ -1914,7 +1914,6 @@
 # CONFIG_MT7615E is not set
 # CONFIG_MT7663U is not set
 # CONFIG_MT7915E is not set
-# CONFIG_MT7921E is not set
 # CONFIG_WLAN_VENDOR_MICROCHIP is not set
 CONFIG_WLAN_VENDOR_RALINK=y
 # CONFIG_RT2X00 is not set
@@ -4500,7 +4499,7 @@
 CONFIG_DEBUG_INFO_COMPRESSED=y
 # CONFIG_DEBUG_INFO_SPLIT is not set
 # CONFIG_DEBUG_INFO_DWARF4 is not set
-# CONFIG_GDB_SCRIPTS is not set
+CONFIG_GDB_SCRIPTS=y
 CONFIG_FRAME_WARN=2048
 # CONFIG_STRIP_ASM_SYMS is not set
 # CONFIG_READABLE_ASM is not set
@@ -4540,13 +4539,13 @@
 CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y
 CONFIG_PAGE_OWNER=y
 CONFIG_PAGE_POISONING=y
-# CONFIG_DEBUG_PAGE_REF is not set
+CONFIG_DEBUG_PAGE_REF=y
 # CONFIG_DEBUG_RODATA_TEST is not set
 CONFIG_ARCH_HAS_DEBUG_WX=y
 CONFIG_DEBUG_WX=y
 CONFIG_GENERIC_PTDUMP=y
 CONFIG_PTDUMP_CORE=y
-# CONFIG_PTDUMP_DEBUGFS is not set
+CONFIG_PTDUMP_DEBUGFS=y
 CONFIG_DEBUG_OBJECTS=y
 # CONFIG_DEBUG_OBJECTS_SELFTEST is not set
 CONFIG_DEBUG_OBJECTS_FREE=y
@@ -4568,8 +4567,8 @@
 CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE=y
 CONFIG_DEBUG_VM=y
 CONFIG_DEBUG_VM_VMACACHE=y
-# CONFIG_DEBUG_VM_RB is not set
-# CONFIG_DEBUG_VM_PGFLAGS is not set
+CONFIG_DEBUG_VM_RB=y
+CONFIG_DEBUG_VM_PGFLAGS=y
 CONFIG_DEBUG_VM_PGTABLE=y
 CONFIG_ARCH_HAS_DEBUG_VIRTUAL=y
 CONFIG_DEBUG_VIRTUAL=y
@@ -4581,7 +4580,13 @@
 CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
 CONFIG_CC_HAS_KASAN_GENERIC=y
 CONFIG_CC_HAS_WORKING_NOSANITIZE_ADDRESS=y
-# CONFIG_KASAN is not set
+CONFIG_KASAN=y
+CONFIG_KASAN_GENERIC=y
+# CONFIG_KASAN_OUTLINE is not set
+CONFIG_KASAN_INLINE=y
+CONFIG_KASAN_STACK=1
+CONFIG_KASAN_VMALLOC=y
+# CONFIG_TEST_KASAN_MODULE is not set
 # end of Memory Debugging

 CONFIG_DEBUG_SHIRQ=y

Thanks Kalle. We will have to try to reproduce and investigate this on a
different controller as well to dive in and fix any race if seen.

Thanks,
Bhaumik
---
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [Linux for Sparc]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux