Re: KASAN: slab-out-of-bounds Read in bacpy

Dmitry Vyukov <dvyukov@xxxxxxxxxx> · Tue, 19 Mar 2019 14:27:32 +0100

On Sun, Mar 17, 2019 at 9:41 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Sun, Mar 17, 2019 at 10:12 AM Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> >
> > Please see https://github.com/google/syzkaller/blob/master/docs/syzbot.md#bisection
> > it should answer all of your questions. It does 2 and more.
> > And in this case it seems to be working as intended bisecting it to a
> > release tag.
>
> No, it's definitely not working as intended.
>
> You can see it in the bisect log - you don't actually have a single
> "git bisect bad" outside of the initial one that you start bisecting
> with. That's a pretty good sign of bisection being completely broken.
> Yes, it can happen in theory, but in general with a good bisection,
> you should see about as many "good" results as "bad".
>
> I bet that what's going on is that your initial "let's test every
> release" uses a _different_ process than the actual bisection itself
> does.
>
> So if I were you, I'd look at what syzbot does differently during
> bisection vs what it does for that initial "test each release". For
> example, does it do "make clean" in between each build in one case,
> but not the other? Does it do "make oldconfig" vs a fixed config
> generated from scratch every time? Because the fact that you first
> tested 4.10 bad using the "test each release", and then when you do
> bisection, the very commit *before* 4.10 is good (the only difference
> being the EXTRAVERSION and the tag) shows that something went wrong.

Well, this is intended behavior for some definition of intended.
The root cause of what happened here is that syzbot has to disable
CONFIG_USBIP_VHCI_HCD/CONFIG_BT_HCIVHCI when it crosses v4.10
boundary. It fixes boot on the release and otherwise no bisection will
succeed at all.
It's just happened so that this particular bug is dependent on these
exact configs and was introduced before v4.10. So it was bisected to
v4.10. And in this sense it is working as intended.

How would you define intended bisection behavior for the situation
when kernel is build/boot/test broken most of the time, even on
releases and even on recent releases? ;)
I guess the 100% fair answer is "the bug happens as far as we could
test (which is not too far)". And that's what I did initially, but the
result was way less useful than what we have now.

This and other details of the process are described here:
https://github.com/google/syzkaller/blob/master/docs/syzbot.md#bisection
This was the first attempt at giving more transparency into the process.

I see 2 potential improvements:
1. (simpler) noting in the bisection log things like disabled configs,
cherry-picked fixes and other things necessary to repair kernel.
2. (harder) try to figure out that the bug actually depends on the
disabled config
I've added this to https://github.com/google/syzkaller/issues/1051
But for (2) I would first like to see that this is a common enough
problem rather then a one-off thing, because it's easier to say than
to implement that reliably and this can affect bugs completely
unrelated to the disabled configs due to unavoidable kernel crash
flakes (and then somebody will need to explain what happened to all
people asking).

And obviously doing some real testing before merging each commit into
any kernel tree would help tremendously with bisection long term ;)

Even v5.0 is boot broken if I try to enable more configs. So we will
need to disable more configs in bisection in future as we onboard them
to syzbot. The current points in time we need to disable various
configs suspiciously resemble when they were added to syzbot config...