On 27/11/17 13:23, Ulf Hansson wrote: > On 27 November 2017 at 11:20, Adrian Hunter <adrian.hunter@xxxxxxxxx> wrote: >> On 24/11/17 12:12, Ulf Hansson wrote: >>> [...] >>> >>>> +/* Single sector read during recovery */ >>>> +static void mmc_blk_ss_read(struct mmc_queue *mq, struct request *req) >>> >>> Nitpick: I think mmc_blk_read_single() would be better as it is a more >>> clear name. Would you mind changing it? >>> >>>> +{ >>>> + struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req); >>>> + blk_status_t status; >>>> + >>>> + while (1) { >>>> + mmc_blk_rw_rq_prep(mqrq, mq->card, 1, mq); >>>> + >>>> + mmc_wait_for_req(mq->card->host, &mqrq->brq.mrq); >>>> + >>>> + /* >>>> + * Not expecting command errors, so just give up in that case. >>>> + * If there are retries remaining, the request will get >>>> + * requeued. >>>> + */ >>>> + if (mqrq->brq.cmd.error) >>>> + return; >>> >>> What happens here if the reason to the error is because the card was removed? >> >> Assuming the rescan is waiting for the host claim, the next read / write >> request will end up calling mmc_detect_card_removed() in the recovery. >> After that all following requests will error immediately because >> mmc_mq_queue_rq() calls mmc_card_removed(). > > Yep, that seems reasonable. I have also tested this, so it seems to > work as expected and similar as before. > >> >>> >>> I guess next time __blk_err_check() is called from the >>> mmc_blk_mq_rw_recovery(), this will be detected and managed? >>> >>>> + >>>> + if (blk_rq_bytes(req) <= 512) >>> >>> Shouldn't you check "if (blk_rq_bytes(req) < 512)"? How would you >>> otherwise read the last 512 bytes block? >> >> At this point we have read the last sector but not updated the request, so >> the number of bytes left should be 512. The reason we don't update the >> request is so that the logic in mmc_blk_mq_complete_rq() will work. I will >> add a comment. > > Not sure I get that, but I assume the comment will help me understand. :-) > >> >>> >>>> + break; >>>> + >>>> + status = mqrq->brq.data.error ? BLK_STS_IOERR : BLK_STS_OK; >>>> + >>>> + blk_update_request(req, status, 512); >>> >>> Shouldn't we actually bail out, unless the error is a data ECC error? >>> On the other hand, I guess if it a more severe error, cmd.error will >>> anyway be set above!? >>> >>> One more question, if there is a data error, we may want to try to >>> recover by sending a stop command? How do we manage that? >> >> I was thinking a single-block read would not need a stop. I will think >> some more about error handling here. > > Great! > > Anyway, you may be right - and perhaps it may not be worth adding > error handling, especially if it complicates the code a lot. > > [...] > >>>> +static void mmc_blk_mq_acct_req_done(struct mmc_queue *mq, struct request *req) >>> >>> Nitpick: Can we please try to find a better name for this function. I >>> don't think "acct" is good abbreviation because, to me, it's not >>> self-explaining. >> >> What about mmc_blk_mq_decrement_in_flight() ? > > Looks good, or perhaps even: mmc_blk_mq_dec_in_flight(). > >> >>> >>>> +{ >>>> + struct request_queue *q = req->q; >>>> + unsigned long flags; >>>> + bool put_card; >>>> + >>>> + spin_lock_irqsave(q->queue_lock, flags); >>>> + >>>> + mq->in_flight[mmc_issue_type(mq, req)] -= 1; >>>> + >>>> + put_card = (mmc_tot_in_flight(mq) == 0); >>>> + >>>> + spin_unlock_irqrestore(q->queue_lock, flags); >>>> + >>>> + if (put_card) >>>> + mmc_put_card(mq->card, &mq->ctx); >>> >>> I have tried to convince myself that the protection of calling >>> mmc_get|put_card() is safe, but I am not sure. >>> >>> I am wondering whether there could be races for mmc_get|put_card(). >>> Please see some more related comments below. >> >> mmc_put_card() is safe and necessary if we have seen mmc_tot_in_flight(mq) >> == 0. When the next request arrives it will have to do a mmc_get_card() >> because it is changing the number of requests in flight from 0 to 1. It >> doesn't matter if that mmc_get_card() comes before or after or during this >> mmc_put_card(). >> >>> >>> [...] > > [...] > >>> >>> Anyway, then if using a queue_depth of 64, how will you make sure that >>> you not end up having > 1 requests being prepared at the same time >>> (not counting the one that may be in transfer)? >> >> We are currently single-threaded since every request goes through >> hctx->run_work when BLK_MQ_F_BLOCKING and nr_hw_queues == 1. It might be >> worth adding a mutex to ensure that never changes. >> >> This point also answers some of the questions below, since there can be no >> parallel dispatches. > > Yeah, it clearly does. Thanks! > >>>> + >>>> +enum mmc_issued mmc_blk_mq_issue_rq(struct mmc_queue *mq, struct request *req) >>>> +{ >>>> + struct mmc_blk_data *md = mq->blkdata; >>>> + struct mmc_card *card = md->queue.card; >>>> + struct mmc_host *host = card->host; >>>> + int ret; >>>> + >>>> + ret = mmc_blk_part_switch(card, md->part_type); >>> >>> What if there is an ongoing request, shouldn't you wait for that to >>> complete before switching partition? >> >> Two requests on the same queue cannot be on different partitions because we >> have a different queue (and block device) for each partition. > > That's not true for RPMB anymore I am afraid. > > RPMB shares the same queue as for the main eMMC partition, which is > because we strive towards fair I/O scheduling across the hole device. I hadn't thought of RPMB, but I think the logic is OK, which is good because it is the same as we presently have. Here the md->part_type will be the main area even for RPMB. So this switch won't do anything if we have a request in flight. Then inside __mmc_blk_ioctl_cmd() the switch to RPMB is done, and afterwards mmc_blk_issue_drv_op() switches it back again. > >> >>> >>>> + if (ret) >>>> + return MMC_REQ_FAILED_TO_START; >>>> + >>>> + switch (mmc_issue_type(mq, req)) { >>>> + case MMC_ISSUE_SYNC: >>>> + ret = mmc_blk_wait_for_idle(mq, host); >>>> + if (ret) >>>> + return MMC_REQ_BUSY; >>> >>> Wouldn't it be possible that yet a new SYNC request becomes queued in >>> parallel with this current one. Then, when reaching this point, how do >>> you make sure that new request waits for the current "SYNC" request? >> >> As mentioned above, there are no parallel dispatches. >> >>> >>> I mean is the above mmc_blk_wait_for_idle(), really sufficient to deal >>> with synchronization? >> >> So long as there are no parallel dispatches. >> >>> >>> I guess we could use mmc_claim_host(no-ctx) in some clever way to deal >>> with this, or perhaps there is a better option? >> >> We are relying on there being no parallel dispatches. That is the case now, >> but if it weren't we could use a mutex in mmc_mq_queue_rq(). >> > > Yeah, but then leave that until needed. > >>> >>> BTW, I guess the problem is also present if there is SYNC request >>> ongoing and then is a new ASYNC request coming in. Is the ASYNC >>> request really waiting for the SYNC request to finish? >> >> With no parallel dispatches, the SYNC request runs to completion before >> another request can be dispatched. > > Yes, I get it now. Thanks for clarifying this! > > [...] > >>>> +static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx, >>>> + const struct blk_mq_queue_data *bd) >>>> +{ >>>> + struct request *req = bd->rq; >>>> + struct request_queue *q = req->q; >>>> + struct mmc_queue *mq = q->queuedata; >>>> + struct mmc_card *card = mq->card; >>>> + enum mmc_issue_type issue_type; >>>> + enum mmc_issued issued; >>>> + bool get_card; >>>> + int ret; >>>> + >>>> + if (mmc_card_removed(mq->card)) { >>>> + req->rq_flags |= RQF_QUIET; >>>> + return BLK_STS_IOERR; >>>> + } >>>> + >>>> + issue_type = mmc_issue_type(mq, req); >>>> + >>>> + spin_lock_irq(q->queue_lock); >>>> + >>>> + switch (issue_type) { >>>> + case MMC_ISSUE_ASYNC: >>>> + break; >>>> + default: >>>> + /* >>>> + * Timeouts are handled by mmc core, and we don't have a host >>>> + * API to abort requests, so we can't handle the timeout anyway. >>>> + * However, when the timeout happens, blk_mq_complete_request() >>>> + * no longer works (to stop the request disappearing under us). >>>> + * To avoid racing with that, set a large timeout. >>>> + */ >>>> + req->timeout = 600 * HZ; >>>> + break; >>>> + } >>>> + >>>> + mq->in_flight[issue_type] += 1; >>>> + get_card = (mmc_tot_in_flight(mq) == 1); >>>> + >>>> + spin_unlock_irq(q->queue_lock); >>>> + >>>> + if (!(req->rq_flags & RQF_DONTPREP)) { >>>> + req_to_mmc_queue_req(req)->retries = 0; >>>> + req->rq_flags |= RQF_DONTPREP; >>>> + } >>>> + >>>> + if (get_card) >>> >>> Coming back to the get_card() thingy, which I wonder if it's fragile. >>> >>> A request that finds get_card == true here, doesn't necessarily have >>> to reach to this point first (the task may be preempted), in case >>> there is another request being queued in parallel (or that can't >>> happen?). >>> >>> That could then lead to that the following steps become executed for >>> the other requests, prior anybody calling mmc_get_card(). >> >> You are right, this logic does not support parallel dispatches. >> > > This do raises a question, don't you think it would be beneficial, > especially for CQE to allow parallel dispatches? > > I am not saying we should change this at this point, just that we may > consider changing this for future improvements. I think the benefit is limited because the time to dispatch a request is small compared with the time to complete a request. i.e. a number of requests can be queued before the first one has completed. But yes, it is something to keep in mind.