On Wed, May 25, 2022 at 05:15:17PM +0200, Michal Koutný <mkoutny@xxxxxxxx> wrote: > // ref=1: only base reference > kill_css() > css_get() // fuse, ref+=1 == 2 > percpu_ref_kill_and_confirm > // ref -= 1 == 1: kill base references > [via rcu] > css_killed_ref_fn == refcnt.confirm_switch > queue_work(css->destroy_work) (1) > [via css->destroy_work] > css_killed_work_fn == wq.func > offline_css() // needs fuse > css_put // ref -= 1 == 0: de-fuse, was last > ... > percpu_ref_put_many > css_release > queue_work(css->destroy_work) (2) > [via css->destroy_work] > css_release_work_fn == wq.func Apologies, this is wrong explanation. (I thought this explains why Tadeusz's patch with double get/put didn't fix it (i.e. any number wouldn't help with the explanation above).) But the above is not correct. I've looked at the stack trace [1] and the offending percpu_ref_put_many is called from an RCU callback percpu_ref_switch_to_atomic_rcu(), so I can't actually see why it drops to zero there... Regards, Michal