Re: WARNING in percpu_ref_kill_and_confirm

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 22, 2019 at 7:48 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Mon, Apr 22, 2019 at 9:38 AM Jens Axboe <axboe@xxxxxxxxx> wrote:
> >
> > With the mutex change in, I can trigger it in a second or so. Just ran
> > the reproducer with that change reverted, and I'm not seeing any badness.
> > So I do wonder if the bisect results are accurate?
>
> Looking at the syzbot report, it's syzbot being confused.
>
> The actual WARNING in percpu_ref_kill_and_confirm() only happens with
> recent kernels.
>
> But then syzbot mixes it up with a completely different bug:
>
>    crash: BUG: MAX_STACK_TRACE_ENTRIES too low!
>    BUG: MAX_STACK_TRACE_ENTRIES too low!
>
> and for some reason decides that *that* bug is the same thing entirely.
>
> So yeah, I think the simple percpu_ref_is_dying() check is sufficient,
> and that the syzbot bisection is completely bogus.

Using crashed/not-crashed predicate gives better results overall. More
than half kernel bugs have different manifestations due to different
reasons. And even if we can say for sure that we see a different bug,
we still don't know if the original bug is also there or not. See the
following threads for details:
https://groups.google.com/d/msg/syzkaller-bugs/nFeC8-UG1gg/y6gUEsvAAgAJ
https://groups.google.com/d/msg/syzkaller/sR8aAXaWEF4/tTWYRgvmAwAJ

Unrelated crashes is the most common cause of incorrect bisection
results (66%). To enable better bisection we would need to integrate
some meaningful precommit testing into kernel development process
(would be tremendously useful for other reasons too). E.g. this "BUG:
MAX_STACK_TRACE_ENTRIES too low!" is this:
https://syzkaller.appspot.com/bug?id=dbd70f0407487a061d2d46fdc6bccc94b95ce3c0
and the reproducer is simply opening /dev/infiniband/rdma_cm or
/dev/vhci or something equally simple with LOCKDEP enabled. None of
this was done in a testing environment for several weeks. And then it
took another month to propagate the fix through all distributed kernel
trees. For all that time simple programs crash and bisection can't be
done and we are spending time here...



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux