2017-11-17 18:01 GMT+01:00 Bart Van Assche <Bart.VanAssche@xxxxxxx>: > On Fri, 2017-11-17 at 16:14 +0100, Jack Wang wrote: >> I suspect could be missing run queue or lost IO, IMHO it's unlikely >> below disk probing fix the bug. > > If the system is still in this state or if you can reproduce this issue, > please collect and analyze the information under /sys/kernel/debug/block. > That's the only way I know of to verify whether or not a lockup has been > caused by a missing queue run. If the following command resolves the lockup > then the root cause is definitely a missing queue run: It's production server, I was too late to gather more information. kernel is 4.4.36/4.4.50 Request mode for both multipath and scsi, no multiqueue involvement. I found thread back to 2012, you also report this problem in 3.2.. https://lkml.org/lkml/2012/1/3/163 I might be a very old bug. I will try harder to reproduce. Thanks, Jack > > for f in /sys/kernel/debug/block/*; do echo kick >$f/state; done > > When analyzing queue lockups it's important to also have information about > requests that have been queued but that have not yet been started. I'm using > the following patch locally (will split this patch and submit it properly when > I have the time): > > diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c > index 29e8451931ff..3c9d64793865 100644 > --- a/block/blk-mq-debugfs.c > +++ b/block/blk-mq-debugfs.c > @@ -408,8 +408,7 @@ static void hctx_show_busy_rq(struct request *rq, void *data, bool reserved) > { > const struct show_busy_params *params = data; > > - if (blk_mq_map_queue(rq->q, rq->mq_ctx->cpu) == params->hctx && > - test_bit(REQ_ATOM_STARTED, &rq->atomic_flags)) > + if (blk_mq_map_queue(rq->q, rq->mq_ctx->cpu) == params->hctx) > __blk_mq_debugfs_rq_show(params->m, > list_entry_rq(&rq->queuelist)); > } > diff --git a/drivers/scsi/scsi_debugfs.c b/drivers/scsi/scsi_debugfs.c > index 01f08c03f2c1..41d1e3a01786 100644 > --- a/drivers/scsi/scsi_debugfs.c > +++ b/drivers/scsi/scsi_debugfs.c > @@ -7,10 +7,14 @@ > void scsi_show_rq(struct seq_file *m, struct request *rq) > { > struct scsi_cmnd *cmd = container_of(scsi_req(rq), typeof(*cmd), req); > - int msecs = jiffies_to_msecs(jiffies - cmd->jiffies_at_alloc); > - char buf[80]; > + int alloc_ms = jiffies_to_msecs(jiffies - cmd->jiffies_at_alloc); > + int timeout_ms = jiffies_to_msecs(rq->timeout); > + const u8 *const cdb = READ_ONCE(cmd->cmnd); > + char buf[80] = "(?)"; > > - __scsi_format_command(buf, sizeof(buf), cmd->cmnd, cmd->cmd_len); > - seq_printf(m, ", .cmd=%s, .retries=%d, allocated %d.%03d s ago", buf, > - cmd->retries, msecs / 1000, msecs % 1000); > + if ((cmd->flags & SCMD_INITIALIZED) && cdb) > + __scsi_format_command(buf, sizeof(buf), cdb, cmd->cmd_len); > + seq_printf(m, ", .cmd=%s, .retries=%d, .timeout=%d.%03d, allocated %d.%03d s ago", > + buf, cmd->retries, timeout_ms / 1000, timeout_ms % 1000, > + alloc_ms / 1000, alloc_ms % 1000); > }