On 08/22/18 19:28, Jens Axboe wrote:
On 8/22/18 8:27 AM, Jens Axboe wrote:
On 8/22/18 6:54 AM, Holger Hoffstätte wrote:
On 08/22/18 06:10, Jens Axboe wrote:
[...]
If you have time, please look at the 3 patches I posted earlier today.
Those are for mainline, so should be OK :-)
I'm just playing along at home but with those 3 I get repeatable
hangs & writeback not starting at all, but curiously *only* on my btrfs
device; for inexplicable reasons some other devices with ext4/xfs flush
properly. Yes, that surprised me too, but it's repeatable.
Now this may or may not have something to do with some of my in-testing
patches for btrfs itself, but if I remove those 3 wbt fixes, everything
is golden again. Not eager to repeat since it hangs sync & requires a
hard reboot.. :(
Just thought you'd like to know.
Thanks, that's very useful info! I'll see if I can reproduce that.
Any chance you can try with and see which patch is causing the issue?
I can't reproduce it here, seems solid.
Either that, or a reproducer would be great...
It's a hacked up custom tree but the following things have emerged so far:
- it's not btrfs.
- it also happens with ext4.
- I first suspected bfq on a nonrotational device disabling WBT by default,
but using deadline didn't help either. Can't even mkfs.ext4.
- I suspect - but do not know - that using xfs everywhere else is the
reason I got lucky, because xfs. :D
- it immediately happens with only the first patch
("move disable check into get_limit()")
So the obvious suspect is the new return of UINT_MAX from get_limit() to
__wbt_wait(). I first suspected that I mispatched something, but it's all
like in mainline or your tree. Even the recently moved-around atomic loop
inside rq_wait_inc_below() is 1:1 the same and looks like it should.
Now building mainline and see where that leads me.
cheers,
Holger