On 2018-04-26 18:10, Tetsuo Handa wrote: > Anyway, since Fabiano has a reproducer, I appreciate trying a patch at > https://groups.google.com/forum/#!msg/syzkaller-bugs/qeR_1VM0UvU/1zcLE0Y4BAAJ . Regarding the "INFO: task hung in wb_shutdown (2)" report mentioned above, I can't find a relation with the issue I'm seeing. I observe the livelock at 'balance_dirty_pages' in a different task and after 'wb_shutdown' has already finished. > Also, Fabiano's attempt might be somewhat related to a NULL pointer race condition > at https://groups.google.com/forum/#!msg/syzkaller-bugs/bVx4WExoTDw/G68X4kPcAQAJ . About "general protection fault in wb_workfn", I see that 'bdi_unregister' both clears the 'WB_registered' bit (via wb_shutdown) and sets bdi->dev = NULL: void bdi_unregister(struct backing_dev_info *bdi) { /* make sure nobody finds us on the bdi_list anymore */ bdi_remove_from_list(bdi); -> wb_shutdown(&bdi->wb); cgwb_bdi_unregister(bdi); if (bdi->dev) { bdi_debug_unregister(bdi); device_unregister(bdi->dev); -> bdi->dev = NULL; } (...) } so I believe Jan's patch makes use of that correlation to avoid executing 'wb_workfn' when bdi->dev == NULL thus avoiding the null access: (from "5acda9d bdi: avoid oops on device removal") static void wb_wakeup(struct bdi_writeback *wb) { spin_lock_bh(&wb->work_lock); if (test_bit(WB_registered, &wb->state)) mod_delayed_work(bdi_wq, &wb->dwork, 0); <-- wb_workfn spin_unlock_bh(&wb->work_lock); } Therefore looking at my issue and the syzcaller report, I thinking maybe we shouldn't avoid running 'wb_workfn' after a device removal but instead fix bdi->dev == NULL somewhere else (this is what my code snippet does, as a proof of concept).