On 9/3/20 10:24 PM, Ming Lei wrote: > On Thu, Sep 03, 2020 at 09:37:40PM -0600, Jens Axboe wrote: >> On 9/3/20 9:22 PM, Ming Lei wrote: >>> It is one MD's bug, and percpu_ref_exit() may be called on one ref not >>> initialized via percpu_ref_init(), and the following patch can fix the >>> issue: >> >> I really (REALLY) think this should be handled by percpu_ref_exit(), if > > OK, we can do that by return immediately from percpu_ref_exit() if > percpu_count_ptr(ref) is 0 just like before. Yep that's going to be a must, also see recent syzbot report that's the same issue, just the core block parts instead. >> it worked before. Otherwise you're just setting yourself up for a world >> of pain with other users, and we'll be fixing this fallout for a while. >> I don't want to carry that. So let's just make it do the right thing, >> needing to do this: >> >>> + if (mddev->writes_pending.percpu_count_ptr) >>> + percpu_ref_exit(&mddev->writes_pending); >> >> is really nasty. > > Yeah, it is as mddev_init_writes_pending(): > > if (mddev->writes_pending.percpu_count_ptr) > return 0; > if (percpu_ref_init(&mddev->writes_pending, no_op, > PERCPU_REF_ALLOW_REINIT, GFP_KERNEL) < 0) > return -ENOMEM; Indeed, that's another eye sore... No users should need to know about these internals. Maybe add a percpu_ref_inited() or something to test for it, at least that'd allow us to clean up these bad use cases after the fact. -- Jens Axboe