On 01/13/2017 05:41 PM, Omar Sandoval wrote: > On Fri, Jan 13, 2017 at 12:15:17PM +0100, Hannes Reinecke wrote: >> On 01/11/2017 10:40 PM, Jens Axboe wrote: >>> This adds a set of hooks that intercepts the blk-mq path of >>> allocating/inserting/issuing/completing requests, allowing >>> us to develop a scheduler within that framework. >>> >>> We reuse the existing elevator scheduler API on the registration >>> side, but augment that with the scheduler flagging support for >>> the blk-mq interfce, and with a separate set of ops hooks for MQ >>> devices. >>> >>> We split driver and scheduler tags, so we can run the scheduling >>> independent of device queue depth. >>> >>> Signed-off-by: Jens Axboe <axboe@xxxxxx> >> [ .. ] >>> @@ -823,6 +847,35 @@ static inline unsigned int queued_to_index(unsigned int queued) >>> return min(BLK_MQ_MAX_DISPATCH_ORDER - 1, ilog2(queued) + 1); >>> } >>> >>> +static bool blk_mq_get_driver_tag(struct request *rq, >>> + struct blk_mq_hw_ctx **hctx, bool wait) >>> +{ >>> + struct blk_mq_alloc_data data = { >>> + .q = rq->q, >>> + .ctx = rq->mq_ctx, >>> + .hctx = blk_mq_map_queue(rq->q, rq->mq_ctx->cpu), >>> + .flags = wait ? 0 : BLK_MQ_REQ_NOWAIT, >>> + }; >>> + >>> + if (blk_mq_hctx_stopped(data.hctx)) >>> + return false; >>> + >>> + if (rq->tag != -1) { >>> +done: >>> + if (hctx) >>> + *hctx = data.hctx; >>> + return true; >>> + } >>> + >>> + rq->tag = blk_mq_get_tag(&data); >>> + if (rq->tag >= 0) { >>> + data.hctx->tags->rqs[rq->tag] = rq; >>> + goto done; >>> + } >>> + >>> + return false; >>> +} >>> + >> What happens with the existing request at 'rqs[rq->tag]' ? >> Surely there is one already, right? >> Things like '->init_request' assume a fully populated array, so moving >> one entry to another location is ... interesting. >> >> I would have thought we need to do a request cloning here, >> otherwise this would introduce a memory leak, right? >> (Not to mention a potential double completion, as the request is now at >> two positions in the array) >> >> Cheers, >> >> Hannes > > The entries in tags->rqs aren't slab objects, they're pointers into > pages allocated separately and tracked on tags->page_list. See > blk_mq_alloc_rqs(). In blk_mq_free_rqs(), we free all of the pages on > tags->page_list, so there shouldn't be a memory leak. > > As for hctx->tags->rqs, entries are only overwritten when a scheduler is > enabled. In that case, the rqs array is storing pointers to requests > actually from hctx->sched_tags, so overwriting/leaking isn't an issue. Ah. Thanks. That explains it. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@xxxxxxx +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html