On Wed, 2019-03-27 at 08:56 -0400, Laurence Oberman wrote: > Truncating email content, starting bisect again as suggested. > Email was getting too long with repetition. > > Crux of the issue repeated here so easy to understand topic > > We got to dispatch passing rq_list and the list is corrupted/freed so > we panic. Clearly a race and is in v5.x+ kernels. > This new bisect will find it. > > crash> bt > PID: 9191 TASK: ffff9dea0a8395c0 CPU: 1 COMMAND: "kworker/1:1H" > #0 [ffffa9fe0759fab0] machine_kexec at ffffffff938606cf > #1 [ffffa9fe0759fb08] __crash_kexec at ffffffff9393a48d > #2 [ffffa9fe0759fbd0] crash_kexec at ffffffff9393b659 > #3 [ffffa9fe0759fbe8] oops_end at ffffffff93831c41 > #4 [ffffa9fe0759fc08] no_context at ffffffff9386ecb9 > #5 [ffffa9fe0759fcb0] do_page_fault at ffffffff93870012 > #6 [ffffa9fe0759fce0] page_fault at ffffffff942010ee > [exception RIP: blk_mq_dispatch_rq_list+114] > RIP: ffffffff93b9f202 RSP: ffffa9fe0759fd90 RFLAGS: 00010246 > RAX: ffff9de9c4d3bbc8 RBX: ffff9de9c4d3bbc8 RCX: > 0000000000000004 > RDX: 0000000000000000 RSI: ffffa9fe0759fe20 RDI: > ffff9dea0dad87f0 > RBP: 0000000000000000 R8: 0000000000000000 R9: > 8080808080808080 > R10: ffff9dea33827660 R11: ffffee9d9e097a00 R12: > ffffa9fe0759fe20 > R13: ffff9de9c4d3bb80 R14: 0000000000000000 R15: > ffff9dea0dad87f0 > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > #7 [ffffa9fe0759fe18] blk_mq_sched_dispatch_requests at > ffffffff93ba455c > #8 [ffffa9fe0759fe60] __blk_mq_run_hw_queue at ffffffff93b9e3cf > #9 [ffffa9fe0759fe78] process_one_work at ffffffff938b0c21 > #10 [ffffa9fe0759feb8] worker_thread at ffffffff938b18d9 > #11 [ffffa9fe0759ff10] kthread at ffffffff938b6ee8 > #12 [ffffa9fe0759ff50] ret_from_fork at ffffffff94200215 > Hello Jens, Jianchao Finally made it to this one. I will see if I can revert and test 7f556a44e61d0b62d78db9a2662a5f0daef010f2 is the first bad commit commit 7f556a44e61d0b62d78db9a2662a5f0daef010f2 Author: Jianchao Wang <jianchao.w.wang@xxxxxxxxxx> Date: Fri Dec 14 09:28:18 2018 +0800 blk-mq: refactor the code of issue request directly Merge blk_mq_try_issue_directly and __blk_mq_try_issue_directly into one interface to unify the interfaces to issue requests directly. The merged interface takes over the requests totally, it could insert, end or do nothing based on the return value of .queue_rq and 'bypass' parameter. Then caller needn't any other handling any more and then code could be cleaned up. And also the commit c616cbee ( blk-mq: punt failed direct issue to dispatch list ) always inserts requests to hctx dispatch list whenever get a BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE, this is overkill and will harm the merging. We just need to do that for the requests that has been through .queue_rq. This patch also could fix this. Signed-off-by: Jianchao Wang <jianchao.w.wang@xxxxxxxxxx> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>