Hello Per, On 11/13/2012 11:10 PM, Per Forlin wrote: > On Tue, Oct 30, 2012 at 1:19 PM, Konstantin Dorfman > <kdorfman@xxxxxxxxxxxxxx> wrote: >> Hello, >> >> On 10/29/2012 11:40 PM, Per Forlin wrote: >>> Hi, >>> >>> I would like to move the focus of my concerns from root cause analysis >>> to the actual patch, >>> My first reflection is that the patch is relatively complex and some >>> of the code looks duplicated. >>> Would it be possible to simplify it and re-use the current execution flow. >>> >>> Is the following flow feasible? >>> >>> in mmc_start_req(): >>> -------------- >>> if (rqc == NULL && not_resend) >>> wait_for_both_mmc_and_arrival_of_new_req >>> else >>> wait_only_for_mmc >>> >>> if (arrival_of_new_req) { >>> set flag to indicate fetch-new_req >>> return NULL; >>> } >>> ----------------- >>> >>> in queue.c: >>> if (fetch-new_req) >>> don't overwrite previous req. >>> >>> BR >>> Per >> >> You are talking about using both waiting mechanisms, old (wait on >> completion) and new - wait_event_interruptible()? But how done() >> callback, called on host controller irq context, will differentiate >> which one is relevant for the request? >> >> I think it is better to use single wait point for mmc thread. > > I have sketch a patch to better explain my point. It's not tested it > barely compiles :) > The patch tries to introduce your feature and still keep the same code > path. And it exposes an API that could be used by SDIO as well. > The intention of my sketch patch is only to explain what I tried to > visualize in the pseudo code previously in this thread. > The out come of your final patch should be documented here I think: > Documentation/mmc/mmc-async-req.txt This document is ready, attaching it to this mail and will be included in next version of the patch (or RESEND). > > Here follows my patch sketch: > ........................................................ > diff --git a/drivers/mmc/card/queue.c b/drivers/mmc/card/queue.c > index e360a97..08036a1 100644 > --- a/drivers/mmc/card/queue.c > +++ b/drivers/mmc/card/queue.c > @@ -66,6 +66,8 @@ static int mmc_queue_thread(void *d) > spin_unlock_irq(q->queue_lock); > > if (req || mq->mqrq_prev->req) { > + if (!req) > + mmc_prefetch_start(&mq->mqrq_prev->mmc_active, true); > set_current_state(TASK_RUNNING); > mq->issue_fn(mq, req); > } else { > @@ -79,11 +81,13 @@ static int mmc_queue_thread(void *d) > } > > /* Current request becomes previous request and vice versa. */ > - mq->mqrq_prev->brq.mrq.data = NULL; > - mq->mqrq_prev->req = NULL; > - tmp = mq->mqrq_prev; > - mq->mqrq_prev = mq->mqrq_cur; > - mq->mqrq_cur = tmp; > + if (!mmc_prefetch_pending(&mq->mqrq_prev->mmc_active)) { > + mq->mqrq_prev->brq.mrq.data = NULL; > + mq->mqrq_prev->req = NULL; > + tmp = mq->mqrq_prev; > + mq->mqrq_prev = mq->mqrq_cur; > + mq->mqrq_cur = tmp; > + } > } while (1); > up(&mq->thread_sem); > > @@ -109,10 +113,44 @@ static void mmc_request_fn(struct request_queue *q) > return; > } > > + if (mq->prefetch_enable) { > + spin_lock(&mq->prefetch_lock); > + if (mq->prefetch_completion) > + complete(mq->prefetch_completion); > + mq->prefetch_pending = true; > + spin_unlock(&mq->prefetch_lock); > + } > + > if (!mq->mqrq_cur->req && !mq->mqrq_prev->req) > wake_up_process(mq->thread); > } > > +static void mmc_req_init(struct mmc_async_req *areq, struct completion *compl) > +{ > + struct mmc_queue *mq = > + container_of(areq->prefetch, struct mmc_queue, prefetch); > + > + spin_lock(&mq->prefetch_lock); > + mq->prefetch_completion = compl; > + if (mq->prefetch_pending) > + complete(mq->prefetch_completion); > + spin_unlock(&mq->prefetch_lock); > +} > + > +static void mmc_req_start(struct mmc_async_req *areq, bool enable) > +{ > + struct mmc_queue *mq = > + container_of(areq->prefetch, struct mmc_queue, prefetch); > + mq->prefetch_enable = enable; > +} > + > +static bool mmc_req_pending(struct mmc_async_req *areq) > +{ > + struct mmc_queue *mq = > + container_of(areq->prefetch, struct mmc_queue, prefetch); > + return mq->prefetch_pending; > +} > + > static struct scatterlist *mmc_alloc_sg(int sg_len, int *err) > { > struct scatterlist *sg; > @@ -166,6 +204,12 @@ int mmc_init_queue(struct mmc_queue *mq, struct > mmc_card *card, > int ret; > struct mmc_queue_req *mqrq_cur = &mq->mqrq[0]; > struct mmc_queue_req *mqrq_prev = &mq->mqrq[1]; > + spin_lock_init(&mq->prefetch_lock); > + mq->prefetch.wait_req_init = mmc_req_init; > + mq->prefetch.wait_req_start = mmc_req_start; > + mq->prefetch.wait_req_pending = mmc_req_pending; > + mqrq_cur->mmc_active.prefetch = &mq->prefetch; > + mqrq_prev->mmc_active.prefetch = &mq->prefetch; > > if (mmc_dev(host)->dma_mask && *mmc_dev(host)->dma_mask) > limit = *mmc_dev(host)->dma_mask; > diff --git a/drivers/mmc/card/queue.h b/drivers/mmc/card/queue.h > index d2a1eb4..5afd467 100644 > --- a/drivers/mmc/card/queue.h > +++ b/drivers/mmc/card/queue.h > @@ -33,6 +33,12 @@ struct mmc_queue { > struct mmc_queue_req mqrq[2]; > struct mmc_queue_req *mqrq_cur; > struct mmc_queue_req *mqrq_prev; > + > + struct mmc_async_prefetch prefetch; > + spinlock_t prefetch_lock; > + struct completion *prefetch_completion; > + bool prefetch_enable; > + bool prefetch_pending; > }; > > extern int mmc_init_queue(struct mmc_queue *, struct mmc_card *, spinlock_t *, > diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c > index 9503cab..06fc036 100644 > --- a/drivers/mmc/core/core.c > +++ b/drivers/mmc/core/core.c > @@ -352,7 +352,15 @@ struct mmc_async_req *mmc_start_req(struct mmc_host *host, > mmc_pre_req(host, areq->mrq, !host->areq); > > if (host->areq) { > + if (!areq) > + mmc_prefetch_init(host->areq, > + &host->areq->mrq->completion); > mmc_wait_for_req_done(host, host->areq->mrq); > + if (!areq) { > + mmc_prefetch_start(host->areq, false); > + if (mmc_prefetch_pending(host->areq)) > + return NULL; > + } > err = host->areq->err_check(host->card, host->areq); > } > > diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h > index 65c64ee..ce5d03f 100644 > --- a/include/linux/mmc/host.h > +++ b/include/linux/mmc/host.h > @@ -15,6 +15,7 @@ > #include <linux/sched.h> > #include <linux/device.h> > #include <linux/fault-inject.h> > +#include <linux/completion.h> > > #include <linux/mmc/core.h> > #include <linux/mmc/pm.h> > @@ -140,6 +141,13 @@ struct mmc_host_ops { > > struct mmc_card; > struct device; > +struct mmc_async_req; > + > +struct mmc_async_prefetch { > + void (*wait_req_init)(struct mmc_async_req *, struct completion *); > + void (*wait_req_start)(struct mmc_async_req *, bool); > + bool (*wait_req_pending)(struct mmc_async_req *); > +}; > > struct mmc_async_req { > /* active mmc request */ > @@ -149,8 +157,33 @@ struct mmc_async_req { > * Returns 0 if success otherwise non zero. > */ > int (*err_check) (struct mmc_card *, struct mmc_async_req *); > + struct mmc_async_prefetch *prefetch; > }; > > +/* set completion variable, complete == NULL to reset completion */ > +static inline void mmc_prefetch_init(struct mmc_async_req *areq, > + struct completion *complete) > +{ > + if (areq->prefetch && areq->prefetch->wait_req_init) > + areq->prefetch->wait_req_init(areq, complete); > +} > + > +/* enable or disable prefetch feature */ > +static inline void mmc_prefetch_start(struct mmc_async_req *areq, bool enable) > +{ > + if (areq->prefetch && areq->prefetch->wait_req_start) > + areq->prefetch->wait_req_start(areq, enable); > +} > + > +/* return true if new request is pending otherwise false */ > +static inline bool mmc_prefetch_pending(struct mmc_async_req *areq) > +{ > + if (areq->prefetch && areq->prefetch->wait_req_pending) > + return areq->prefetch->wait_req_pending(areq); > + else > + return false; > +} > + > /** > * struct mmc_slot - MMC slot functions > * > .................................................................... > > > BR > Per I understand your motivation and idea for re-structure the patch. It is still not clear for me how exactly mmc thread will be awaken on new request notification in your version, but I have another problem: We want to expand this event based mechanism (unblock mmc thread from waiting for the running request) by adding another reason/event. This is URGENT_REQUEST event. When block layer has Urgent request to execute, it notifies mmc layer (similar to mmc_request() notification) and mmc layer handles this urgent request notification by correctly stopping running request, re-inserting back to I/O scheduler all not yet performed parts and fetching & starting the urgent request. The reasoning and general idea are documented together with "New event" and will be submitted soon. The patch itself will be submitted in a week or so. As for our discussion - to handle both events mmc layer should be capable to be awaken not only in case of no mmc_prefetch_pending() (in your patch terms), but also when we have perfect case of async requests: one running on the bus and second prepared and waiting for the first. I think, that in case SDIO protocol driver need such functionality as NEW_REQUEST, one need to implement notification to the mmc layer similar to mmc_request() code in my patch. Does this make sense? -- Konstantin Dorfman, QUALCOMM ISRAEL, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Introduction ============ The file system adds block requests to the block device queue asynchronously without waiting for completion. The MMC framework takes these block request synchronously one by one and pushes them to the host driver. When the host driver has completed the request, the framework will pull out a new block request from the block request queue. For each block request the MMC framework converts the generic request to a mmc request. The host driver takes the mmc request and does necessary preparations to process the request. For non-DMA host driver the necessary pre and post preparations are insignificant, but for DMA the cache maintenance is significant. Adding non-blocking request to the MMC framework (between block layer and host driver) will make it possible to prepare caches for one request while another request is being processed by the MMC controller. [Referenced from the Rational section of https://wiki.linaro.org/WorkingGroups/Kernel/Specs/ StoragePerfMMC-async-req] Software description ==================== In case of sequential read operations async request mechanism isn't functional, due to the read ahead mechanism: When "mmcqd" (MMC thread) reports on completion of a request, there should be a context switch to allow the insertion of the next read ahead BIOs to the block layer. Since the "mmcqd" tries to fetch another request immediately after the completion of the previous request - it gets NULL and begins waiting for the completion of the previous request. This wait on completion gives the FS (File System) the opportunity to insert the next request but the MMC layer is already blocked on the previous request completion and is not aware of the new request waiting to be fetched. As a result of this missed oportunity to start cashe prepeare for the new request, while previous request is being serviced. Also it can't look to see if the new request is urgent and the the previous request is not interrupted immediately. A solution of the above would be to add a mechanism that allows to wake up the MMC layer immediately with new block request when it is available for fetch. Design ====== The main idea is to replace the blocking wait for a completion with an event driven mechanism and add a set of events that will unblock the MMC layer context from waiting on host driver complete. The new events to be added are: 1. NEW_REQUEST: When the block layer notifies the MMC layer on a new request (mmc_request() callback), and the MMC layer is waiting on a previous request completion and the current request is NULL. In such a case the NEW_REQUEST event will be triggered to wakeup the waiting thread. MMC layer will then fetch the new request and after its preparation will go back to waiting on the completion of previous request. 2. URGENT_REQUEST: In order to decrease a latency of a prioritized requests (such as READ requests) we might want to stop the transmission of a "low priority" request (such as WRITE) in order to handle the "high priority one". The urgency of the request is decided by the block layer I/O scheduler. When the block layer notifies the MMC layer of an urgent request and the MMC layer is blocked it will be woken up. The decision on whether to stop an ongoing transfer is taken according to several parameters, one of them being the number of bytes transferred though host controller so far. The stopped request (and next prepared request in case it is exists) re-inserted back to the I/O scheduler to be handled after the completion of the urgent request. When the decision taken was not to stop the ongoing transfer, MMC context will wait for normal completion of the ongoing transfer, then already prepared next request (if it exists) will be re-inserted back into block layer and the urgent request fetched. URGENT_REQUEST handling is possible only when host controller driver supports stop_request() API, otherwise mmc_urgent_request() notification callback will not be registered into block layer. Performance =========== With the new event mechanism the READ throughput is improved by 16%. With the Urgent-Request Event (and ROW scheduler) the READ latency is decreased by 41%. Known issues ============ As an performance metric of NEW_REQUEST and URGENT_REQUEST flows we are using average and worst case latency between: 1) block layer dispatches the request 2) MMC layer fetches the request This latency should be minimized to achieve best overall performance. Right now the latency changes due to bus speed, CPU cores and other changes depending on specific hardware and software configuration. To do ===== For URGENT_REQUEST implementation in MMC layer, decision heuristic to stop ongoing transaction may be more accurate and based not on byte count transferred by host controller to the card, but on time needed by MMC card to perform the request.