Re: [PATCH] blk-wbt: Avoid lock contention and thundering herd issue in wbt_wait

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/22/18 1:12 PM, Holger Hoffstätte wrote:
> On 08/22/18 19:28, Jens Axboe wrote:
>> On 8/22/18 8:27 AM, Jens Axboe wrote:
>>> On 8/22/18 6:54 AM, Holger Hoffstätte wrote:
>>>> On 08/22/18 06:10, Jens Axboe wrote:
>>>>> [...]
>>>>> If you have time, please look at the 3 patches I posted earlier today.
>>>>> Those are for mainline, so should be OK :-)
>>>>
>>>> I'm just playing along at home but with those 3 I get repeatable
>>>> hangs & writeback not starting at all, but curiously *only* on my btrfs
>>>> device; for inexplicable reasons some other devices with ext4/xfs flush
>>>> properly. Yes, that surprised me too, but it's repeatable.
>>>> Now this may or may not have something to do with some of my in-testing
>>>> patches for btrfs itself, but if I remove those 3 wbt fixes, everything
>>>> is golden again. Not eager to repeat since it hangs sync & requires a
>>>> hard reboot.. :(
>>>> Just thought you'd like to know.
>>>
>>> Thanks, that's very useful info! I'll see if I can reproduce that.
>>
>> Any chance you can try with and see which patch is causing the issue?
>> I can't reproduce it here, seems solid.
>>
>> Either that, or a reproducer would be great...
> 
> It's a hacked up custom tree but the following things have emerged so far:
> 
> - it's not btrfs.
> 
> - it also happens with ext4.
> 
> - I first suspected bfq on a nonrotational device disabling WBT by default,
> but using deadline didn't help either. Can't even mkfs.ext4.
> 
> - I suspect - but do not know - that using xfs everywhere else is the
> reason I got lucky, because xfs. :D
> 
> - it immediately happens with only the first patch
> ("move disable check into get_limit()")
> 
> So the obvious suspect is the new return of UINT_MAX from get_limit() to
> __wbt_wait(). I first suspected that I mispatched something, but it's all
> like in mainline or your tree. Even the recently moved-around atomic loop
> inside rq_wait_inc_below() is 1:1 the same and looks like it should.
> Now building mainline and see where that leads me.

I wonder if it's a signedness thing? Can you try and see if using INT_MAX
instead changes anything?

-- 
Jens Axboe




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux