Hi Rick, On Fri, Oct 11, 2024 at 03:17:43PM -0400, Rick Koch wrote: > Hello linux-block, > > I have been working with a fellow Ham Radio operator, Martin CT1IQI, on an > upgrade > to an open source SDR radio. The upgrade will replace a pi CM4 with a Raxda > CM5. > https://apache-labs.com/al-products/1061/ANAN-G2-Ultra-HF--6M-100W-Ultra-High-Performance-SDR.html > > We are progressing very well with that project but have come across an > intermittent > issue that we are hoping you may provide some clues on how to fix. > > We are using kernel version 6.11.1 under an Armbian OS. This issue doesn't > happen > on the Armbian 6.1.75 branch but will happen without any of our changes to > 6.11.1. > I have also tested with 6.11.3 and found the same problem. > > This issue is a kernel Oops that happens randomly early in boot. Probably 1 > out of 10 > boots. It will hang if the issue happens. > > I wonder if you may have any ideas about it? I have attached a dmesg but it > is the dmesg from > after a successful boot as I don't know how to get the dmesg when the Oops > happens as the > board is locked up. If there are other methods to get more info to you > please let me know. > > Misc info: > root@saturn-radxa-cm5-8inch:~# lspci > 0004:40:00.0 PCI bridge: Rockchip Electronics Co., Ltd RK3588 (rev 01) > > Radxa CM5 Compute Module attached to a piCM4-IO board > > Samsung KLMCG2UCTB 16GB onboard eMMC > > Kernel version 6.11.1 > > This is the Oops: > > > [ 1.515476] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP > [ 1.516043] Modules linked in: > [ 1.516326] CPU: 1 UID: 0 PID: 21 Comm: cpuhp/1 Not tainted > 6.11.1-edge-rockchip-rk3588 #1 > [ 1.517063] Hardware name: Radxa CM5 Saturn SDR (DT) > [ 1.517506] pstate: a0400009 (NzCv daif +PAN -UAO -TCO -DIT -SSBS > BTYPE=--) > [ 1.518128] pc : blk_mq_hctx_notify_online+0x34/0xb0 Can you test the following patch first? diff --git a/block/blk-mq.c b/block/blk-mq.c index 4b2c8e940f59..2ea6edff56d4 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -4310,6 +4310,8 @@ int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set, /* mark the queue as mq asap */ q->mq_ops = set->ops; + q->tag_set = set; + if (blk_mq_alloc_ctxs(q)) goto err_exit; @@ -4328,8 +4330,6 @@ int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set, INIT_WORK(&q->timeout_work, blk_mq_timeout_work); blk_queue_rq_timeout(q, set->timeout ? set->timeout : 30 * HZ); - q->tag_set = set; - q->queue_flags |= QUEUE_FLAG_MQ_DEFAULT; INIT_DELAYED_WORK(&q->requeue_work, blk_mq_requeue_work); If the above patch doesn't work, please figure out the above pc points to which line of source code by: $gdb vmlinux gdb>l *(blk_mq_hctx_notify_online+0x34) Thanks, Ming