Re: General protection fault with use_blk_mq=1.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> Il giorno 29 mar 2018, alle ore 05:22, Jens Axboe <axboe@xxxxxxxxx> ha scritto:
> 
> On 3/28/18 9:13 PM, Zephaniah E. Loss-Cutler-Hull wrote:
>> On 03/28/2018 06:02 PM, Jens Axboe wrote:
>>> On 3/28/18 5:03 PM, Zephaniah E. Loss-Cutler-Hull wrote:
>>>> I am not subscribed to any of the lists on the To list here, please CC
>>>> me on any replies.
>>>> 
>>>> I am encountering a fairly consistent crash anywhere from 15 minutes to
>>>> 12 hours after boot with scsi_mod.use_blk_mq=1 dm_mod.use_blk_mq=1> 
>>>> The crash looks like:
>>>> 
>> 
>>>> 
>>>> Looking through the code, I'd guess that this is dying inside
>>>> blkg_rwstat_add, which calls percpu_counter_add_batch, which is what RIP
>>>> is pointing at.
>>> 
>>> Leaving the whole thing here for Paolo - it's crashing off insertion of
>>> a request coming out of SG_IO. Don't think we've seen this BFQ failure
>>> case before.
>>> 
>>> You can mitigate this by switching the scsi-mq devices to mq-deadline
>>> instead.
>>> 
>> 
>> I'm thinking that I should also be able to mitigate it by disabling
>> CONFIG_DEBUG_BLK_CGROUP.
>> 
>> That should remove that entire chunk of code.
>> 
>> Of course, that won't help if this is actually a symptom of a bigger
>> problem.
> 
> Yes, it's not a given that it will fully mask the issue at hand. But
> turning off BFQ has a much higher chance of working for you.
> 
> This time actually CC'ing Paolo.
> 

Hi Zephaniah,
if you are actually interested in the benefits of BFQ (low latency,
high responsiveness, fairness, ...) then it may be worth to try what
you yourself suggest: disabling CONFIG_DEBUG_BLK_CGROUP.  Also because
this option activates the heavy computation of debug cgroup statistics,
which probably you don't use.

In addition, the outcome of your attempt without
CONFIG_DEBUG_BLK_CGROUP would give us useful bisection information:
- if no failure occurs, then the issue is likely to be confined in
that debugging code (which, on the bright side, is likely to be of
occasional interest, for only a handful of developers)
- if the issue still shows up, then we may have new hints on this odd
failure

Finally, consider that this issue has been reported to disappear from
4.16 [1], and, as a plus, that the service quality of BFQ had a
further boost exactly from 4.16.

Looking forward to your feedback, in case you try BFQ without
CONFIG_DEBUG_BLK_CGROUP,
Paolo

[1] https://www.spinics.net/lists/linux-block/msg21422.html

> 
> -- 
> Jens Axboe





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux