Re: fsync hangs after scsi rejected a request

"jianchao.wang" <jianchao.w.wang@xxxxxxxxxx> · Mon, 28 Jan 2019 10:42:23 +0800

On 1/26/19 11:37 PM, Florian Stecker wrote:
> 
> 
> On 1/26/19 2:25 AM, jianchao.wang wrote:
...
>>>>
>>>> dmesg: https://urldefense.proofpoint.com/v2/url?u=https-3A__florianstecker.de_dmesg-2D2019-2D01-2D25.txt&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=t_udGGsH8ivtjXHin4oQ4LxELuPoXbl_ARP7oSxiQ6k&s=DC2fhRw2g4haGhhHxOYyX4zD6FWQQ5YoERdfptVRHM4&e=
>>>> Note that no I/O scheduler is in use here, I deactivated them in a udev rule because I still want to be able to use my laptop. When I test this stuff I just reactivate mq-deadline manually.
>>>>
>>>> Config is the default in Arch Linux: https://urldefense.proofpoint.com/v2/url?u=https-3A__git.archlinux.org_svntogit_packages.git_tree_trunk_config-3Fh-3Dpackages_linux&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=t_udGGsH8ivtjXHin4oQ4LxELuPoXbl_ARP7oSxiQ6k&s=GjV_fyXaKZ8za8zdB_gqmKW7yBlbla2OHRaAT0iGZkc&e=
>>>>
>>>> The problem also appears with BFQ.
>>>>
>>>> And rq->end_io is set to mq_flush_data_end_io when this happens. The only point I see where this function could invoke blk_mq_end_request is via blk_flush_complete_seq, but it does so only if seq == REQ_FSEQ_DONE. However, seq is REQ_FSEQ_POSTFLUSH for me (i.e. it just transitioned from REQ_FSEQ_DATA to REQ_FSEQ_POSTFLUSH).
>>>
>>>
>>> Could it be this scenario ?
>>>
>>> data + post flush
>>>
>>> blk_flush_complete_seq          a flush
>>>    case REQ_FSEQ_DATA
>>>    blk_flush_queue_rq
>>>       finally issued to driver      blk_mq_dispatch_rq_list
>>>                                  try to issue a flush req
>>>                                  failed due to NON-NCQ command
>>>                                  due to RESTART is set, do nothing
>>>    req complete
>>>    req->end_io（） // doesn't check RESTART
>>>    mq_flush_data_end_io
>>> 　  blk_flush_complete_seq
>>>       case REQ_FSEQ_POSTFLUSH
>>>       blk_kick_flush
>>>         do nothing because previous flush
>>>         has not been completed, so
>>>         flush_pending_idx != flush_running_idx
>>>

This should be corrected as following,
 data + post flush
 blk_flush_complete_seq          a flush
   case REQ_FSEQ_DATA
   blk_flush_queue_rq
      finally issued to driver   blk_mq_dispatch_rq_list
                                 try to issue a flush req
                                 failed due to NON-NCQ command
                                 return BLK_STS_DEV_RESOURCE

 req complete
 req->end_io（） // doesn't check RESTART
 mq_flush_data_end_io
 case REQ_FSEQ_POSTFLUSH
    blk_kick_flush
    do nothing because previous flush
    has not been completed, so
    flush_pending_idx != flush_running_idx
 blk_mq_mq_run_hw_queue
                                 insert rq to hctx->dispatch_list
                                 due to RESTART is set, do nothing

...
> I'm not sure if I understand every detail, but your scenario sounds roughly the same as what I imagine. The patch solves the problem for me (except for the fact that hctx is not defined in that function).
> 
> Actually, it doesn't seem to me as if any of the other users of blk_mq_free_request require or even expect that it calls blk_mq_sched_restart (I could be wrong though). If that is true, how about we just call it from __blk_mq_end_request, like this (also tested successfully on my laptop)?
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 6a7566244de3..9f3d3456c4af 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -481,7 +481,6 @@ static void __blk_mq_free_request(struct request *rq)
>                 blk_mq_put_tag(hctx, hctx->tags, ctx, rq->tag);
>         if (sched_tag != -1)
>                 blk_mq_put_tag(hctx, hctx->sched_tags, ctx, sched_tag);
> -       blk_mq_sched_restart(hctx);
>         blk_queue_exit(q);
>  }
> 
> @@ -522,6 +521,7 @@ EXPORT_SYMBOL_GPL(blk_mq_free_request);
>  inline void __blk_mq_end_request(struct request *rq, blk_status_t error)
>  {
>         u64 now = ktime_get_ns();
> +       struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(rq->q, rq->mq_ctx->cpu);
> 
>         if (rq->rq_flags & RQF_STATS) {
>                 blk_mq_poll_stats_start(rq->q);
> @@ -541,6 +541,8 @@ inline void __blk_mq_end_request(struct request *rq, blk_status_t error)
>                         blk_mq_free_request(rq->next_rq);
>                 blk_mq_free_request(rq);
>         }
> +
> +       blk_mq_sched_restart(hctx);
>  }
>  EXPORT_SYMBOL(__blk_mq_end_request);

We should invoke blk_mq_sched_restart before blk_queue_exit is invoked.

How about this one ?

diff --git a/block/blk-flush.c b/block/blk-flush.c
index a3fc7191..6e0f2d9 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -335,7 +335,7 @@ static void mq_flush_data_end_io(struct request *rq, blk_status_t error)
        blk_flush_complete_seq(rq, fq, REQ_FSEQ_DATA, error);
        spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
 
-       blk_mq_run_hw_queue(hctx, true);
+       blk_mq_sched_restart(hctx);
 }

Thanks
Jianchao