On 8/22/18 1:12 PM, Holger Hoffstätte wrote: > On 08/22/18 19:28, Jens Axboe wrote: >> On 8/22/18 8:27 AM, Jens Axboe wrote: >>> On 8/22/18 6:54 AM, Holger Hoffstätte wrote: >>>> On 08/22/18 06:10, Jens Axboe wrote: >>>>> [...] >>>>> If you have time, please look at the 3 patches I posted earlier today. >>>>> Those are for mainline, so should be OK :-) >>>> >>>> I'm just playing along at home but with those 3 I get repeatable >>>> hangs & writeback not starting at all, but curiously *only* on my btrfs >>>> device; for inexplicable reasons some other devices with ext4/xfs flush >>>> properly. Yes, that surprised me too, but it's repeatable. >>>> Now this may or may not have something to do with some of my in-testing >>>> patches for btrfs itself, but if I remove those 3 wbt fixes, everything >>>> is golden again. Not eager to repeat since it hangs sync & requires a >>>> hard reboot.. :( >>>> Just thought you'd like to know. >>> >>> Thanks, that's very useful info! I'll see if I can reproduce that. >> >> Any chance you can try with and see which patch is causing the issue? >> I can't reproduce it here, seems solid. >> >> Either that, or a reproducer would be great... > > It's a hacked up custom tree but the following things have emerged so far: > > - it's not btrfs. > > - it also happens with ext4. > > - I first suspected bfq on a nonrotational device disabling WBT by default, > but using deadline didn't help either. Can't even mkfs.ext4. > > - I suspect - but do not know - that using xfs everywhere else is the > reason I got lucky, because xfs. :D > > - it immediately happens with only the first patch > ("move disable check into get_limit()") > > So the obvious suspect is the new return of UINT_MAX from get_limit() to > __wbt_wait(). I first suspected that I mispatched something, but it's all > like in mainline or your tree. Even the recently moved-around atomic loop > inside rq_wait_inc_below() is 1:1 the same and looks like it should. > Now building mainline and see where that leads me. I wonder if it's a signedness thing? Can you try and see if using INT_MAX instead changes anything? -- Jens Axboe