On Fri, Sep 08 2017 at 12:48pm -0400, Jens Axboe <axboe@xxxxxxxxx> wrote: > On 09/08/2017 10:41 AM, Mike Snitzer wrote: > > On Fri, Sep 08 2017 at 5:13P -0400, > > Paolo Valente <paolo.valente@xxxxxxxxxx> wrote: > > > >> > >>> Il giorno 07 set 2017, alle ore 17:52, Mike Snitzer <snitzer@xxxxxxxxxx> ha scritto: > >>> > >>> On Tue, Sep 05 2017 at 10:15am -0400, > >>> Bart Van Assche <Bart.VanAssche@xxxxxxx> wrote: > >>> > >>>> On Tue, 2017-09-05 at 09:56 +0200, Paolo Valente wrote: > >>>>> Ok, my suspects seem confirmed: the path dm_mq_queue_rq -> map_request > >>>>> -> setup_clone -> blk_rq_prep_clone creates a cloned request without > >>>>> invoking e->type->ops.mq.prepare_request for the target elevator e. > >>>>> The cloned request is therefore not initialized for the scheduler, but > >>>>> it is however inserted into the scheduler by > >>>>> blk_mq_sched_insert_request. This seems an error for any scheduler > >>>>> that needs to initialize fields in the incoming request, or in general > >>>>> to take some preliminary action. > >>>>> > >>>>> Am I missing something here? > >>>> > >>>> (+Mike Snitzer) > >>>> > >>>> Mike, do you perhaps have the time to look into this memory leak? > >>> > >>> It isn't a memory leak, it is missing initialization in the case of > >>> cloned requests (if I'm understanding Paolo correctly). > >>> > >> > >> Exactly! > >> > >>> But cloned requests shouldn't be going through the scheduler. Only the > >>> original requests should. > >>> > >>> Commit bd166ef18 ("blk-mq-sched: add framework for MQ capable IO > >>> schedulers") switched from blk_mq_insert_request() to > >>> blk_mq_sched_insert_request() and in doing so it opened dm-mpath up to > >>> this bug. > >>> > >>> Could be we need to take steps to ensure the block layer still > >>> supports bypassing the elevator by using direct insertion? > >>> > >>> Or blk_mq_sched_insert_request() needs updating to check if > >>> e->type->ops.mq.prepare_request were actually performed and to fallback > >>> to the !elevator case if not.. > >>> > >>> Not sure on the fix, but I can look closer if others (like Jens or > >>> Paolo) don't have quicker suggestions. > >>> > >> > >> No quick suggestion from me :( > >> > >> Thanks for analyzing this bug, > > > > Please see the following untested patch. All > > testing/review/comments/acks appreciated. > > > > I elected to use elevator_change() rather than fiddle with adding a new > > blk-mq elevator hook (e.g. ->request_prepared) to verify that each > > blk-mq elevator enabled request did in fact get prepared. > > > > Bart, please test this patch and reply with your review/feedback. > > > > Jens, if you're OK with this solution please reply with your Ack and > > I'll send it to Linus along with the rest of the handful of DM changes I > > have for 4.14. > > I am not - we used to have this elevator change functionality from > inside the kernel, and finally got rid of it when certain drivers killed > it. I don't want to be bringing it back. Fine. > Sounds like we have two issues here. One is that we run into issues with > stacking IO schedulers, and the other is that we'd rather not have > multiple schedulers in play for a stacked setup. > > Maybe it'd be cleaner to have the dm-mq side of things not insert > through the scheduler, but rather just FIFO on the target end? That was how blk_insert_cloned_request() was before. From the patch header (you may have missed it): "Commit bd166ef18 ("blk-mq-sched: add framework for MQ capable IO schedulers") switched blk_insert_cloned_request() from using blk_mq_insert_request() to blk_mq_sched_insert_request(). Which incorrectly added elevator machinery into a call chain that isn't supposed to have any." So shouldn't blk_insert_cloned_request() be made to _not_ use blk_mq_sched_insert_request()? We'd need a new block interface established, or equivalent open-coded in blk_insert_cloned_request(), to handle direct dispatch to an mq request_queue's queue(s). Mike