On Wed, 2019-03-27 at 18:00 -0400, Laurence Oberman wrote: > On Wed, 2019-03-27 at 08:56 -0400, Laurence Oberman wrote: > > Truncating email content, starting bisect again as suggested. > > Email was getting too long with repetition. > > > > Crux of the issue repeated here so easy to understand topic > > > > We got to dispatch passing rq_list and the list is corrupted/freed > > so > > we panic. Clearly a race and is in v5.x+ kernels. > > This new bisect will find it. > > > > crash> bt > > PID: 9191 TASK: ffff9dea0a8395c0 CPU: 1 COMMAND: > > "kworker/1:1H" > > #0 [ffffa9fe0759fab0] machine_kexec at ffffffff938606cf > > #1 [ffffa9fe0759fb08] __crash_kexec at ffffffff9393a48d > > #2 [ffffa9fe0759fbd0] crash_kexec at ffffffff9393b659 > > #3 [ffffa9fe0759fbe8] oops_end at ffffffff93831c41 > > #4 [ffffa9fe0759fc08] no_context at ffffffff9386ecb9 > > #5 [ffffa9fe0759fcb0] do_page_fault at ffffffff93870012 > > #6 [ffffa9fe0759fce0] page_fault at ffffffff942010ee > > [exception RIP: blk_mq_dispatch_rq_list+114] > > RIP: ffffffff93b9f202 RSP: ffffa9fe0759fd90 RFLAGS: 00010246 > > RAX: ffff9de9c4d3bbc8 RBX: ffff9de9c4d3bbc8 RCX: > > 0000000000000004 > > RDX: 0000000000000000 RSI: ffffa9fe0759fe20 RDI: > > ffff9dea0dad87f0 > > RBP: 0000000000000000 R8: 0000000000000000 R9: > > 8080808080808080 > > R10: ffff9dea33827660 R11: ffffee9d9e097a00 R12: > > ffffa9fe0759fe20 > > R13: ffff9de9c4d3bb80 R14: 0000000000000000 R15: > > ffff9dea0dad87f0 > > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > > #7 [ffffa9fe0759fe18] blk_mq_sched_dispatch_requests at > > ffffffff93ba455c > > #8 [ffffa9fe0759fe60] __blk_mq_run_hw_queue at ffffffff93b9e3cf > > #9 [ffffa9fe0759fe78] process_one_work at ffffffff938b0c21 > > #10 [ffffa9fe0759feb8] worker_thread at ffffffff938b18d9 > > #11 [ffffa9fe0759ff10] kthread at ffffffff938b6ee8 > > #12 [ffffa9fe0759ff50] ret_from_fork at ffffffff94200215 > > > > Hello Jens, Jianchao > Finally made it to this one. > I will see if I can revert and test > > 7f556a44e61d0b62d78db9a2662a5f0daef010f2 is the first bad commit > commit 7f556a44e61d0b62d78db9a2662a5f0daef010f2 > Author: Jianchao Wang <jianchao.w.wang@xxxxxxxxxx> > Date: Fri Dec 14 09:28:18 2018 +0800 > > blk-mq: refactor the code of issue request directly > > Merge blk_mq_try_issue_directly and __blk_mq_try_issue_directly > into one interface to unify the interfaces to issue requests > directly. The merged interface takes over the requests totally, > it could insert, end or do nothing based on the return value of > .queue_rq and 'bypass' parameter. Then caller needn't any other > handling any more and then code could be cleaned up. > > And also the commit c616cbee ( blk-mq: punt failed direct issue > to dispatch list ) always inserts requests to hctx dispatch list > whenever get a BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE, this is > overkill and will harm the merging. We just need to do that for > the requests that has been through .queue_rq. This patch also > could fix this. > > Signed-off-by: Jianchao Wang <jianchao.w.wang@xxxxxxxxxx> > Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> > > > Cannot clean revert loberman@ibclient linux]$ git revert 7f556a44e61d0b62d78db9a2662a5f0daef010f2 error: could not revert 7f556a4... blk-mq: refactor the code of issue request directly hint: after resolving the conflicts, mark the corrected paths hint: with 'git add <paths>' or 'git rm <paths>' hint: and commit the result with 'git commit' Revert "blk-mq: refactor the code of issue request directly" This reverts commit 7f556a44e61d0b62d78db9a2662a5f0daef010f2. Conflicts: block/blk-mq.c No clear what in this commit is breaking things and causing the race