Re: v4.16-rc1 + dm-mpath + BFQ

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> Il giorno 09 feb 2018, alle ore 20:18, Jens Axboe <axboe@xxxxxxxxx> ha scritto:
> 
> On 2/9/18 12:14 PM, Bart Van Assche wrote:
>> On 02/09/18 10:58, Jens Axboe wrote:
>>> On 2/9/18 11:54 AM, Bart Van Assche wrote:
>>>> Hello Paolo,
>>>> 
>>>> If I enable the BFQ scheduler for a dm-mpath device then a kernel oops
>>>> appears (see also below). This happens systematically with Linus' tree from
>>>> this morning (commit 54ce685cae30) merged with Jens' for-linus branch (commit
>>>> a78773906147 ("block, bfq: add requeue-request hook")) and for-next branch
>>>> (commit 88455ad7f928). Is this a known issue?
>>> 
>>> Does it happen on Linus -git as well, or just with my for-linus merged in?
>>> What I'm getting at is if a78773906147 caused this or not.
>> 
>> Hello Jens,
>> 
>> Thanks for chiming in. After having reverted commit a78773906147, after 
>> having rebuilt the BFQ scheduler, after having rebooted and after having 
>> repeated the test I see the same kernel oops being reported. I think 
>> that means that this regression is not caused by commit a78773906147. In 
>> case it would be useful, here is how gdb translates the crash address:
>> 
>> $ gdb block/bfq*ko
>> (gdb) list *(bfq_remove_request+0x8d)
>> 0x280d is in bfq_remove_request (block/bfq-iosched.c:1760).
>> 1755                    list_del_init(&rq->queuelist);
>> 1756            bfqq->queued[sync]--;
>> 1757            bfqd->queued--;
>> 1758            elv_rb_del(&bfqq->sort_list, rq);
>> 1759
>> 1760            elv_rqhash_del(q, rq);
>> 1761            if (q->last_merge == rq)
>> 1762                    q->last_merge = NULL;
>> 1763
>> 1764            if (RB_EMPTY_ROOT(&bfqq->sort_list)) {
> 
> Looks very odd. So clearly RQF_HASHED is set, but we're blowing up on
> the hash list pointers. I'll let Paolo take a look at this one. Thanks
> for testing without that commit, I want to push out my pending fixes
> today and this would have thrown a wrench in the works.
> 

Also this smells a little bit like some spurious elevator call.
Unfortunately I have no clue on the cause.  To go on, I need at least
to reproduce it.  In this respect: Bart, could you please tell me how
to setup the offending configuration, and to cause the failure?
Possibly with just one, or at most two PCs.  I don't have fancier hw
at the moment.

Thanks,
Paolo

> -- 
> Jens Axboe





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux