Re: Kernel Oops in blk_mq_hctx_notify_online() using Raxda CM5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Rick,

On Fri, Oct 11, 2024 at 03:17:43PM -0400, Rick Koch wrote:
> Hello linux-block,
> 
> I have been working with a fellow Ham Radio operator, Martin CT1IQI, on an
> upgrade
> to an open source SDR radio. The upgrade will replace a pi CM4 with a Raxda
> CM5.
> https://apache-labs.com/al-products/1061/ANAN-G2-Ultra-HF--6M-100W-Ultra-High-Performance-SDR.html
> 
> We are progressing very well with that project but have come across an
> intermittent
> issue that we are hoping you may provide some clues on how to fix.
> 
> We are using kernel version 6.11.1 under an Armbian OS. This issue doesn't
> happen
> on the Armbian 6.1.75 branch but will happen without any of our changes to
> 6.11.1.
> I have also tested with 6.11.3 and found the same problem.
> 
> This issue is a kernel Oops that happens randomly early in boot. Probably 1
> out of 10
> boots. It will hang if the issue happens.
> 
> I wonder if you may have any ideas about it? I have attached a dmesg but it
> is the dmesg from
> after a successful boot as I don't know how to get the dmesg when the Oops
> happens as the
> board is locked up. If there are other methods to get more info to you
> please let me know.
> 
> Misc info:
> root@saturn-radxa-cm5-8inch:~# lspci
> 0004:40:00.0 PCI bridge: Rockchip Electronics Co., Ltd RK3588 (rev 01)
> 
> Radxa CM5 Compute Module attached to a piCM4-IO board
> 
> Samsung KLMCG2UCTB 16GB onboard eMMC
> 
> Kernel version 6.11.1
> 
> This is the Oops:
> 
> 
> [    1.515476] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> [    1.516043] Modules linked in:
> [    1.516326] CPU: 1 UID: 0 PID: 21 Comm: cpuhp/1 Not tainted
> 6.11.1-edge-rockchip-rk3588 #1
> [    1.517063] Hardware name: Radxa CM5 Saturn SDR (DT)
> [    1.517506] pstate: a0400009 (NzCv daif +PAN -UAO -TCO -DIT -SSBS
> BTYPE=--)
> [    1.518128] pc : blk_mq_hctx_notify_online+0x34/0xb0

Can you test the following patch first?

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4b2c8e940f59..2ea6edff56d4 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -4310,6 +4310,8 @@ int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 	/* mark the queue as mq asap */
 	q->mq_ops = set->ops;
 
+	q->tag_set = set;
+
 	if (blk_mq_alloc_ctxs(q))
 		goto err_exit;
 
@@ -4328,8 +4330,6 @@ int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 	INIT_WORK(&q->timeout_work, blk_mq_timeout_work);
 	blk_queue_rq_timeout(q, set->timeout ? set->timeout : 30 * HZ);
 
-	q->tag_set = set;
-
 	q->queue_flags |= QUEUE_FLAG_MQ_DEFAULT;
 
 	INIT_DELAYED_WORK(&q->requeue_work, blk_mq_requeue_work);


If the above patch doesn't work, please figure out the above pc points to which
line of source code by:

$gdb vmlinux
gdb>l *(blk_mq_hctx_notify_online+0x34)


Thanks, 
Ming





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux