On Mon, Jun 18, 2018 at 03:46:58PM +0200, Jan Kara wrote: > syzbot is reporting NULL pointer dereference at wb_workfn() [1] due to > wb->bdi->dev being NULL. And Dmitry confirmed that wb->state was > WB_shutting_down after wb->bdi->dev became NULL. This indicates that > unregister_bdi() failed to call wb_shutdown() on one of wb objects. > > The problem is in cgwb_bdi_unregister() which does cgwb_kill() and thus > drops bdi's reference to wb structures before going through the list of > wbs again and calling wb_shutdown() on each of them. This way the loop > iterating through all wbs can easily miss a wb if that wb has already > passed through cgwb_remove_from_bdi_list() called from wb_shutdown() > from cgwb_release_workfn() and as a result fully shutdown bdi although > wb_workfn() for this wb structure is still running. In fact there are > also other ways cgwb_bdi_unregister() can race with > cgwb_release_workfn() leading e.g. to use-after-free issues: > > CPU1 CPU2 > cgwb_bdi_unregister() > cgwb_kill(*slot); > > cgwb_release() > queue_work(cgwb_release_wq, &wb->release_work); > cgwb_release_workfn() > wb = list_first_entry(&bdi->wb_list, ...) > spin_unlock_irq(&cgwb_lock); > wb_shutdown(wb); > ... > kfree_rcu(wb, rcu); > wb_shutdown(wb); -> oops use-after-free > > We solve these issues by synchronizing writeback structure shutdown from > cgwb_bdi_unregister() with cgwb_release_workfn() using a new mutex. That > way we also no longer need synchronization using WB_shutting_down as the > mutex provides it for CONFIG_CGROUP_WRITEBACK case and without > CONFIG_CGROUP_WRITEBACK wb_shutdown() can be called only once from > bdi_unregister(). > > Reported-by: syzbot <syzbot+4a7438e774b21ddd8eca@xxxxxxxxxxxxxxxxxxxxxxxxx> > Signed-off-by: Jan Kara <jack@xxxxxxx> Acked-by: Tejun Heo <tj@xxxxxxxxxx> Thanks. -- tejun