> Il giorno 18 apr 2018, alle ore 00:57, Jens Axboe <axboe@xxxxxxxxx> ha scritto: > > On 4/17/18 3:48 PM, Jens Axboe wrote: >> On 4/17/18 3:47 PM, Kees Cook wrote: >>> On Tue, Apr 17, 2018 at 2:39 PM, Jens Axboe <axboe@xxxxxxxxx> wrote: >>>> On 4/17/18 3:25 PM, Kees Cook wrote: >>>>> On Tue, Apr 17, 2018 at 1:46 PM, Kees Cook <keescook@xxxxxxxxxxxx> wrote: >>>>>> I see elv.priv[1] assignments made in a few places -- is it possible >>>>>> there is some kind of uninitialized-but-not-NULL state that can leak >>>>>> in there? >>>>> >>>>> Got it. This fixes it for me: >>>>> >>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>>>> index 0dc9e341c2a7..859df3160303 100644 >>>>> --- a/block/blk-mq.c >>>>> +++ b/block/blk-mq.c >>>>> @@ -363,7 +363,7 @@ static struct request *blk_mq_get_request(struct >>>>> request_queue *q, >>>>> >>>>> rq = blk_mq_rq_ctx_init(data, tag, op); >>>>> if (!op_is_flush(op)) { >>>>> - rq->elv.icq = NULL; >>>>> + memset(&rq->elv, 0, sizeof(rq->elv)); >>>>> if (e && e->type->ops.mq.prepare_request) { >>>>> if (e->type->icq_cache && rq_ioc(bio)) >>>>> blk_mq_sched_assign_ioc(rq, bio); >>>>> @@ -461,7 +461,7 @@ void blk_mq_free_request(struct request *rq) >>>>> e->type->ops.mq.finish_request(rq); >>>>> if (rq->elv.icq) { >>>>> put_io_context(rq->elv.icq->ioc); >>>>> - rq->elv.icq = NULL; >>>>> + memset(&rq->elv, 0, sizeof(rq->elv)); >>>>> } >>>>> } >>>> >>>> This looks like a BFQ problem, this should not be necessary. Paolo, >>>> you're calling your own prepare request handler from the insert >>>> as well, and your prepare request does nothing if rq->elv.icq == NULL. >>> >>> I sent the patch anyway, since it's kind of a robustness improvement, >>> I'd hope. If you fix BFQ also, please add: >> >> It's also a memset() in the hot path, would prefer to avoid that... >> The issue here is really the convoluted bfq usage of insert/prepare, >> I'm sure Paolo can take it from here. > Hi, I'm very sorry for tuning in very late, but, at the same time, very glad to find the problem probably already solved ;) (in this respect, I swear, my delay was not intentional) > Does this fix it? > > diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c > index f0ecd98509d8..d883469a1582 100644 > --- a/block/bfq-iosched.c > +++ b/block/bfq-iosched.c > @@ -4934,8 +4934,11 @@ static void bfq_prepare_request(struct request *rq, struct bio *bio) > bool new_queue = false; > bool bfqq_already_existing = false, split = false; > > - if (!rq->elv.icq) > + if (!rq->elv.icq) { > + rq->elv.priv[0] = rq->elv.priv[1] = NULL; > return; > + } > + This does solve the problem at hand. But it also arouses a question, related to a possible subtle bug. For BFQ, !rq->elv.icq basically means "this request is not for me, as I am an icq-based scheduler". But, IIUC the main points in this thread, then this assumption is false. If it is actually false, then I hope that all requests with !rq->elv.icq that are sent to BFQ do verify the condition (at_head || blk_rq_is_passthrough(rq)). In fact, requests that do not verify that condition are those that BFQ must put in a bfq_queue. So, even if this patch makes the crash disappear, we drive BFQ completely crazy (and we may expect other strange failures) if we send BFQ a request with !((at_head || blk_rq_is_passthrough(rq)) and !rq->elv.icq. BFQ has to put that rq into a bfq_queue, but simply cannot. Jens, or any other, could you please shed a light on this, and explain how things are exactly? Thanks, Paolo > bic = icq_to_bic(rq->elv.icq); > > spin_lock_irq(&bfqd->lock); > > -- > Jens Axboe