Re: Panic when rebooting target server testing srp on 5.0.0-rc2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2019-03-27 at 08:56 -0400, Laurence Oberman wrote:
> Truncating email content, starting bisect again as suggested.
> Email was getting too long with repetition.
> 
> Crux of the issue repeated here so easy to understand topic
> 
> We got to dispatch passing rq_list and the list is corrupted/freed so
> we panic. Clearly a race and is in v5.x+ kernels.
> This new bisect will find it.
> 
> crash> bt
> PID: 9191   TASK: ffff9dea0a8395c0  CPU: 1   COMMAND: "kworker/1:1H"
>  #0 [ffffa9fe0759fab0] machine_kexec at ffffffff938606cf
>  #1 [ffffa9fe0759fb08] __crash_kexec at ffffffff9393a48d
>  #2 [ffffa9fe0759fbd0] crash_kexec at ffffffff9393b659
>  #3 [ffffa9fe0759fbe8] oops_end at ffffffff93831c41
>  #4 [ffffa9fe0759fc08] no_context at ffffffff9386ecb9
>  #5 [ffffa9fe0759fcb0] do_page_fault at ffffffff93870012
>  #6 [ffffa9fe0759fce0] page_fault at ffffffff942010ee
>     [exception RIP: blk_mq_dispatch_rq_list+114]
>     RIP: ffffffff93b9f202  RSP: ffffa9fe0759fd90  RFLAGS: 00010246
>     RAX: ffff9de9c4d3bbc8  RBX: ffff9de9c4d3bbc8  RCX:
> 0000000000000004
>     RDX: 0000000000000000  RSI: ffffa9fe0759fe20  RDI:
> ffff9dea0dad87f0
>     RBP: 0000000000000000   R8: 0000000000000000   R9:
> 8080808080808080
>     R10: ffff9dea33827660  R11: ffffee9d9e097a00  R12:
> ffffa9fe0759fe20
>     R13: ffff9de9c4d3bb80  R14: 0000000000000000  R15:
> ffff9dea0dad87f0
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>  #7 [ffffa9fe0759fe18] blk_mq_sched_dispatch_requests at
> ffffffff93ba455c
>  #8 [ffffa9fe0759fe60] __blk_mq_run_hw_queue at ffffffff93b9e3cf
>  #9 [ffffa9fe0759fe78] process_one_work at ffffffff938b0c21
> #10 [ffffa9fe0759feb8] worker_thread at ffffffff938b18d9
> #11 [ffffa9fe0759ff10] kthread at ffffffff938b6ee8
> #12 [ffffa9fe0759ff50] ret_from_fork at ffffffff94200215
> 
Hello Jens, Jianchao
Finally made it to this one.
I will see if I can revert and test

7f556a44e61d0b62d78db9a2662a5f0daef010f2 is the first bad commit
commit 7f556a44e61d0b62d78db9a2662a5f0daef010f2
Author: Jianchao Wang <jianchao.w.wang@xxxxxxxxxx>
Date:   Fri Dec 14 09:28:18 2018 +0800

    blk-mq: refactor the code of issue request directly
    
    Merge blk_mq_try_issue_directly and __blk_mq_try_issue_directly
    into one interface to unify the interfaces to issue requests
    directly. The merged interface takes over the requests totally,
    it could insert, end or do nothing based on the return value of
    .queue_rq and 'bypass' parameter. Then caller needn't any other
    handling any more and then code could be cleaned up.
    
    And also the commit c616cbee ( blk-mq: punt failed direct issue
    to dispatch list ) always inserts requests to hctx dispatch list
    whenever get a BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE, this is
    overkill and will harm the merging. We just need to do that for
    the requests that has been through .queue_rq. This patch also
    could fix this.
    
    Signed-off-by: Jianchao Wang <jianchao.w.wang@xxxxxxxxxx>
    Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>






[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux