Re: 💥 PANICKED: Test report for kernel 5.9.0-rc3-020ad03.cki (block)

Jens Axboe <axboe@xxxxxxxxx> · Fri, 4 Sep 2020 09:06:50 -0600

On 9/3/20 10:24 PM, Ming Lei wrote:
> On Thu, Sep 03, 2020 at 09:37:40PM -0600, Jens Axboe wrote:
>> On 9/3/20 9:22 PM, Ming Lei wrote:
>>> It is one MD's bug, and percpu_ref_exit() may be called on one ref not
>>> initialized via percpu_ref_init(), and the following patch can fix the
>>> issue:
>>
>> I really (REALLY) think this should be handled by percpu_ref_exit(), if
> 
> OK, we can do that by return immediately from percpu_ref_exit() if
> percpu_count_ptr(ref) is 0 just like before.

Yep that's going to be a must, also see recent syzbot report that's the
same issue, just the core block parts instead.

>> it worked before. Otherwise you're just setting yourself up for a world
>> of pain with other users, and we'll be fixing this fallout for a while.
>> I don't want to carry that. So let's just make it do the right thing,
>> needing to do this:
>>
>>> +       if (mddev->writes_pending.percpu_count_ptr)
>>> +               percpu_ref_exit(&mddev->writes_pending);
>>
>> is really nasty.
> 
> Yeah, it is as mddev_init_writes_pending():
> 
>         if (mddev->writes_pending.percpu_count_ptr)
>                 return 0;
>         if (percpu_ref_init(&mddev->writes_pending, no_op,
>                             PERCPU_REF_ALLOW_REINIT, GFP_KERNEL) < 0)
>                 return -ENOMEM;

Indeed, that's another eye sore... No users should need to know about
these internals. Maybe add a percpu_ref_inited() or something to test
for it, at least that'd allow us to clean up these bad use cases after
the fact.

-- 
Jens Axboe