Re: [PATCH] KVM: X86: Fix scan ioapic use-before-initialization

Dmitry Vyukov <dvyukov@xxxxxxxxxx> · Wed, 2 Jan 2019 15:08:29 +0100

On Fri, Dec 28, 2018 at 10:09 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Fri, Dec 28, 2018 at 1:43 AM Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> >
> > > Nobody reads the kernel mailing list directly - there's just too much traffic.
> >
> > As the result bug reports and patches got lots and this is bad and it
> > would be useful to stop it from happening and there are known ways for
> > this.
>
> Well, let me be a  bit more specific: you will find that people read
> the very _targeted_ mailing lists, because they not only tend to be
> more specific to some particular interest, but also aren't the flood
> of hundreds of emails a day.
>
> And don't get me wrong: I'm not saying that lkml is useless. Not at
> all. It's just that it's really more of an archival model than a
> "people read it" - so you send your emails to a group of people, and
> then you cc lkml so that when that group gets expanded people can be
> pointed at the whole thread. Or, obviously, so that commit messages
> etc can point to discussion.
>
> But that does mean that any lkml cc shouldn't be expected to cause a
> reaction in itself. It's about other things.
>
> > syzbot not doing bisection is not the root cause of this
>
> Root case? No. But if you do bisection, it means that you can now
> target things much better. So then it's not lkml and "random
> collection of maintainers", but a much more targeted group.
>
> And that targeted group also ends up being a lot more receptive to it.
>
> Again, look at the raw syzbot email and the email by Wanpeng Li. Yes,
> the syzbot email did bring in a reasonable set of people just based on
> the oops (I think it did "get_mainainter" on kvm_ioapic_scan_entry()).
> But Wangpeng ended up sending it to the *particular* people who were
> directly responsible.
>
> > 2. syzbot reports are not worse then average human reports, frequently better.
>
> No, they really aren't.
>
> They are better in a *technical* sense, but they are also very much
> obviously automated, which makes the target people take them much less
> seriously.
>
> When you see lots of syzbot emails, and there are lots of more or less
> random recipients that may or may not be correct, what's the natural
> reaction to that?
>
> Look up "bystander effect".
>
> > 3. Bisection is useful, but not important in most cases.
>
> No.
>
> Exactly because of the problem syzbot has. It's too scatter-shot.
> People clearly ignore it, because people feel it's not _their_ issue.
>
> The advantage of bisection is that it makes the problem much more
> specific. Right now, you'll find that many developers ignore syzbot
> simply because it's not worth their time to chase down whether it's
> even their problem.
>
> See what I'm saying?
>
> It's the whole "data vs information" issue. Particularly when cc'ing
> maintainers, who get hundreds of emails a day, you need to convince
> them that this email is _relevant_.

I see what you are saying and I agree that bisection results will make
reports better in some cases. But I mean a more general problem.

Say you reported a bug, and it happened so that you missed that single
right person in CC because something, whatever, can happen, right?
With the current process it will be a coin flip if your report will be
routed to the right person or lost. And it's not that you personally
care a lot about this particular bug, it just happened that you
noticed it and wanted to be a good samaritan. So you will not keep
track of it on a post-note on your monitor and won't ping later. But
the bug can be bad and either cause security problems later, or reach
release and break things in the field and then require 1000x more work
to port the fix to all downstream forks.

Or, we heavily rely on end users for testing. End users are not kernel
developers and can't be generally expected to do pre-triage and proper
routing. Losing these valuable reports is bad because only small
fraction of users report anything to projects and this can also affect
user trust, if you see that your reports are not acted on, you don't
report next time.

Even if we take syzbot, it won't be able to bisect all the time for
multiple reasons:
 - some bugs don't have reproducers (but still very real and sometimes
manageable to fix)
 - kernel is build/boot broken sometimes for prolonged periods
 - some old bugs are bisected to introduction of the debugging tool
that detects the bug
 - some crashes can be too flaky for reliable bisection
 - some reproducers won't work on older kernels, yet the bug is there
 - ...
So it's will be nice to have bisection results when they are
available, but it does not feel like it should be the only guarantee
of a bug report not being lost.

Moreover, you can see in the examples I referenced above that they
were delivered to the right people, but then still lost because there
is nothing in the kernel development process that would prevent loses.

Moreover, replying on a small set of private emails generally creates
problems wrt bus-factor and vacations. It would be useful if anybody
could see what are the open bugs for rdma_cm subsystem at any point in
time.