On 1/23/18 6:48 AM, David Zarzycki wrote: > > >> On Jan 22, 2018, at 20:20, Jens Axboe <axboe@xxxxxxxxx> wrote: >> >> All of these are off the blk-wbt completion path. I suggested earlier to >> try and disable CONFIG_BLK_WBT to see if it goes away, or at least to >> see if the pattern changes. > > Hi Jens, > > Bingo! Disabling CONFIG_BLK_WBT makes the problem go away. Interesting. The only thing I can think of is block/blk-wbt.c:get_rq_wait() returning a bogus pointer, but your compiler would need to be broken for that. And I think your lockdep would have exploded if that was the case. See below for a quick'n dirty you can try and run to disprove that theory. >>> I’m open to trying anything at this point. Thanks for helping, >> >> I'd try other types of stress testing. Has the machine otherwise been >> stable, or is it a new box? > > It is a new box. Other than the CONFIG_BLK_WBT problem, it handles > stress just fine. If you want to debug this further, I’m willing to > run instrumented code. The below is a long shot, but I'll try and think about it some more. I haven't had any reports like this, ever, so it's very puzzling. diff --git a/block/blk-wbt.c b/block/blk-wbt.c index ae8de9780085..5a45e9245d89 100644 --- a/block/blk-wbt.c +++ b/block/blk-wbt.c @@ -103,7 +103,7 @@ static bool wb_recent_wait(struct rq_wb *rwb) static inline struct rq_wait *get_rq_wait(struct rq_wb *rwb, bool is_kswapd) { - return &rwb->rq_wait[is_kswapd]; + return &rwb->rq_wait[!!is_kswapd]; } static void rwb_wake_all(struct rq_wb *rwb) -- Jens Axboe