On 03/02/2017 11:07 AM, Paolo Valente wrote: > >> Il giorno 02 mar 2017, alle ore 17:13, Jens Axboe <axboe@xxxxxx> ha scritto: >> >> On 03/02/2017 09:07 AM, Paolo Valente wrote: >>> >>>> Il giorno 02 mar 2017, alle ore 16:13, Jens Axboe <axboe@xxxxxx> ha scritto: >>>> >>>> On 03/02/2017 08:00 AM, Jens Axboe wrote: >>>>> On 03/02/2017 03:28 AM, Paolo Valente wrote: >>>>>> >>>>>>> Il giorno 15 feb 2017, alle ore 19:04, Jens Axboe <axboe@xxxxxx> ha scritto: >>>>>>> >>>>>>> On 02/15/2017 10:58 AM, Jens Axboe wrote: >>>>>>>> On 02/15/2017 10:24 AM, Paolo Valente wrote: >>>>>>>>> >>>>>>>>>> Il giorno 10 feb 2017, alle ore 19:32, Omar Sandoval <osandov@xxxxxxxxxxx> ha scritto: >>>>>>>>>> >>>>>>>>>> From: Omar Sandoval <osandov@xxxxxx> >>>>>>>>>> >>>>>>>>>> None of the other blk-mq elevator hooks are called with this lock held. >>>>>>>>>> Additionally, it can lead to circular locking dependencies between >>>>>>>>>> queue_lock and the private scheduler lock. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Hi Omar, >>>>>>>>> I'm sorry but it seems that a new potential deadlock has showed up. >>>>>>>>> See lockdep splat below. >>>>>>>>> >>>>>>>>> I've tried to think about different solutions than turning back to >>>>>>>>> deferring the body of exit_icq, but at no avail. >>>>>>>> >>>>>>>> Looks like a interaction between bfqd->lock and q->queue_lock. Since the >>>>>>>> core has no notion of you bfqd->lock, the naturally dependency here >>>>>>>> would be to nest bfqd->lock inside q->queue_lock. Is that possible for >>>>>>>> you? >>>>>>>> >>>>>>>> Looking at the code a bit, maybe it'd just be simpler to get rid of >>>>>>>> holding the queue lock for that spot. For the mq scheduler, we really >>>>>>>> don't want places where we invoke with that held already. Does the below >>>>>>>> work for you? >>>>>>> >>>>>>> Would need to remove one more lockdep assert. And only test this for >>>>>>> the mq parts, we'd need to spread a bit of love on the classic >>>>>>> scheduling icq exit path for this to work on that side. >>>>>>> >>>>>> >>>>>> >>>>>> Jens, >>>>>> here is the reply I anticipated in my previous email: after rebasing >>>>>> against master, I'm getting again the deadlock that this patch of >>>>>> yours solved (together with some changes in bfq-mq). I thought you added a >>>>>> sort of equivalent commit (now) to the mainline branch. Am I wrong? >>>>> >>>>> The patch I posted was never pulled to completion, it wasn't clear >>>>> to me if it fixed your issue or not. Maybe I missed a reply on >>>>> that? >>>>> >>>>> Let me take another stab at it today, I'll send you a version to test >>>>> on top of my for-linus branch. >>>> >>>> I took at look at my last posting, and I think it was only missing a >>>> lock grab in cfq, since we don't hold that for icq exit anymore. Paolo, >>>> it'd be great if you could verify that this works. Not for bfq-mq (we >>>> already know it does), but for CFQ. >>> >>> No luck, sorry :( >>> >>> It looks like to have just not to take q->queue_lock in cfq_exit_icq. >> >> I was worried about that. How about the below? We need to grab the lock >> at some point for legacy scheduling, but the ordering should be correct. >> >> > > It seems to be: no more crashes or lockdep complains after several tests, boots and shutdowns :) Great! Can I add your Tested-by? -- Jens Axboe