On Wed, Jun 01, 2022 at 05:40:51PM -0700, Tadeusz Struk <tadeusz.struk@xxxxxxxxxx> wrote: > css_killed_ref_fn() will be called regardless of the value of refcnt (via percpu_ref_kill_and_confirm()) > and it will only enqueue the css_killed_work_fn() to be called later. > Then css_put()->css_release() will be called before the css_killed_work_fn() will even > get a chance to run, and it will also *only* enqueue css_release_work_fn() to be called later. > The problem happens on the second enqueue. So there need to be something in place that > will make sure that css_killed_work_fn() is done before css_release() can enqueue > the second job. IIUC, here you describe the same scenario I broke down at [1]. > Does it sound right? I added a parameter A there (that is sum of base and percpu references before kill_css()). I thought it fails because A == 1 (i.e. killing the base reference), however, that seems an unlikely situation (because cgroup code uses a "fuse" reference to pin css for offline_css()). So the remaining option (at least I find it more likely now) is that A == 0 (A < 0 would trigger the warning in percpu_ref_switch_to_atomic_rcu()), aka the ref imbalance. I hope we can get to the bottom of this with detailed enough tracing of gets/puts. Splitting the work struct is condradictive to the existing approach with the "fuse" reference. (BTW you also wrote On Wed, Jun 01, 2022 at 05:00:44PM -0700, Tadeusz Struk <tadeusz.struk@xxxxxxxxxx> wrote: > The fact the css_release() is called (via cgroup_kn_unlock()) just after > kill_css() causes the css->destroy_work to be enqueued twice on the same WQ > (cgroup_destroy_wq), just with different function. This results in the > BUG: corrupted list in insert_work issue. Where do you see a critical css_release called from cgroup_kn_unlock()? I always observed the css_release() being called via percpu_ref_call_confirm_rcu() (in the original and subsequent syzbot logs.)) Thanks, Michal [1] https://lore.kernel.org/r/Yo7KfEOz92kS2z5Y@blackbook/