Hi Jens, The bug still reproduces with this change. How confident are we that kernel objects are properly reference counted while they are throttled? Dave > On Jan 23, 2018, at 10:34, Jens Axboe <axboe@xxxxxxxxx> wrote: > > On 1/23/18 6:48 AM, David Zarzycki wrote: >> >> >>> On Jan 22, 2018, at 20:20, Jens Axboe <axboe@xxxxxxxxx> wrote: >>> >>> All of these are off the blk-wbt completion path. I suggested earlier to >>> try and disable CONFIG_BLK_WBT to see if it goes away, or at least to >>> see if the pattern changes. >> >> Hi Jens, >> >> Bingo! Disabling CONFIG_BLK_WBT makes the problem go away. > > Interesting. The only thing I can think of is > block/blk-wbt.c:get_rq_wait() returning a bogus pointer, but your > compiler would need to be broken for that. And I think your lockdep > would have exploded if that was the case. See below for a quick'n dirty > you can try and run to disprove that theory. > >>>> I’m open to trying anything at this point. Thanks for helping, >>> >>> I'd try other types of stress testing. Has the machine otherwise been >>> stable, or is it a new box? >> >> It is a new box. Other than the CONFIG_BLK_WBT problem, it handles >> stress just fine. If you want to debug this further, I’m willing to >> run instrumented code. > > The below is a long shot, but I'll try and think about it some more. I > haven't had any reports like this, ever, so it's very puzzling. > > > diff --git a/block/blk-wbt.c b/block/blk-wbt.c > index ae8de9780085..5a45e9245d89 100644 > --- a/block/blk-wbt.c > +++ b/block/blk-wbt.c > @@ -103,7 +103,7 @@ static bool wb_recent_wait(struct rq_wb *rwb) > > static inline struct rq_wait *get_rq_wait(struct rq_wb *rwb, bool is_kswapd) > { > - return &rwb->rq_wait[is_kswapd]; > + return &rwb->rq_wait[!!is_kswapd]; > } > > static void rwb_wake_all(struct rq_wb *rwb) > > -- > Jens Axboe >