Re: [BUG] Guest OSes die simultaneously (bisected)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 04, 2024 at 03:59:52PM +0100, Paolo Bonzini wrote:
> On Thu, Jan 4, 2024 at 3:58 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> >
> > On Wed, Jan 03, 2024 at 05:00:35PM -0800, Paul E. McKenney wrote:
> > > On Wed, Jan 03, 2024 at 04:24:06PM -0800, Sean Christopherson wrote:
> > > > On Wed, Jan 03, 2024, Paul E. McKenney wrote:
> > > > > On Wed, Jan 03, 2024 at 02:22:23PM -0800, Paul E. McKenney wrote:
> > > > > > Hello!
> > > > > >
> > > > > > Since some time between v5.19 and v6.4, long-running rcutorture tests
> > > > > > would (rarely but intolerably often) have all guests on a given host die
> > > > > > simultaneously with something like an instruction fault or a segmentation
> > > > > > violation.
> > > > > >
> > > > > > Each bisection step required 20 hosts running 10 hours each, and
> > > > > > this eventually fingered commit c59a1f106f5c ("KVM: x86/pmu: Add
> > > > > > IA32_PEBS_ENABLE MSR emulation for extended PEBS").  Although this commit
> > > > > > is certainly messing with things that could possibly cause all manner
> > > > > > of mischief, I don't immediately see a smoking gun.  Except that the
> > > > > > commit prior to this one is rock solid.
> > > > > > Just to make things a bit more exciting, bisection in mainline proved
> > > > > > to be problematic due to bugs of various kinds that hid this one.  I was
> > > > > > therefore forced to bisect among the commits backported to the internal
> > > > > > v5.19-based kernel, which fingered the backported version of the patch
> > > > > > called out above.
> > > > >
> > > > > Ah, and so why do I believe that this is a problem in mainline rather
> > > > > than just (say) a backporting mistake?
> > > > >
> > > > > Because this issue was first located in v6.4, which already has this
> > > > > commit included.
> > > > >
> > > > >                                                   Thanx, Paul
> > > > >
> > > > > > Please note that this is not (yet) an emergency.  I will just continue
> > > > > > to run rcutorture on v5.19-based hypervisors in the meantime.
> > > > > >
> > > > > > Any suggestions for debugging or fixing?
> > > >
> > > > This looks suspect:
> > > >
> > > > +       u64 pebs_mask = cpuc->pebs_enabled & x86_pmu.pebs_capable;
> > > > +       int global_ctrl, pebs_enable;
> > > >
> > > > -       arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL;
> > > > -       arr[0].host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask;
> > > > -       arr[0].guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask;
> > > > -       arr[0].guest &= ~(cpuc->pebs_enabled & x86_pmu.pebs_capable);
> > > > -       *nr = 1;
> > > > +       *nr = 0;
> > > > +       global_ctrl = (*nr)++;
> > > > +       arr[global_ctrl] = (struct perf_guest_switch_msr){
> > > > +               .msr = MSR_CORE_PERF_GLOBAL_CTRL,
> > > > +               .host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask,
> > > > +               .guest = intel_ctrl & (~cpuc->intel_ctrl_host_mask | ~pebs_mask),
> > > > +       };
> > > >
> > > >
> > > > IIUC (always a big if with this code), the intent is that the guest's version of
> > > > PERF_GLOBAL_CTRL gets bits that are (a) not exclusive to the host and (b) not
> > > > being used for PEBS.  (b) is necessary because PEBS generates records in memory
> > > > using virtual addresses, i.e. the CPU will write to memory using a virtual address
> > > > that is valid for the host but not the guest.  And so PMU counters that are
> > > > configured to generate PEBS records need to be disabled while running the guest.
> > > >
> > > > Before that commit, the logic was:
> > > >
> > > >   guest[PERF_GLOBAL_CTRL] = ctrl & ~host;
> > > >   guest[PERF_GLOBAL_CTRL] &= ~pebs;
> > > >
> > > > But after, it's now:
> > > >
> > > >   guest[PERF_GLOBAL_CTRL] = ctrl & (~host | ~pebs);
> > > >
> > > > I.e. the kernel is enabled counters in the guest that are not host-only OR not
> > > > PEBS.  E.g. if only counter 0 is in use, it's using PEBS, but it's not exclusive
> > > > to the host, then the new code will yield (truncated to a single byte for sanity)
> > > >
> > > >   1 = 1 & (0xf | 0xe)
> > > >
> > > > and thus keep counter 0 enabled, whereas the old code would yield
> > > >
> > > >   1 = 1 & 0xf
> > > >   0 = 1 & 0xe
> > > >
> > > > A bit of a shot in the dark and completed untested, but I think this is the correct
> > > > fix?
> > >
> > > I am firing off some tests, and either way, thank you very much!!!
> >
> > Woo-hoo!!!  ;-)
> >
> > Tested-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
> >
> > Will you be sending a proper patch, or would you prefer that I do so?
> > In the latter case, I would need your Signed-off-by.
> 
> I will fast track this one to Linus.

Thank you, Paolo!

One additional request, if I may be so bold...

Although I am happy to have been able to locate the commit (and even
happier that Sean spotted the problem and that you quickly pushed the
fix to mainline!), chasing this consumed a lot of time and systems over
an embarrassingly large number of months.  As in I first spotted this
bug in late July.  Despite a number of increasingly complex attempts,
bisection became feasible only after the buggy commit was backported to
our internal v5.19 code base.  :-(

My (completely random) guess is that there is some rare combination
of events that causes this code to fail.  If so, is it feasible to
construct a test that makes this rare combination of events less rare,
so that similar future bugs are caught more quickly?

And please understand that I am not casting shade on those who wrote,
reviewed, and committed that buggy commit.  As in I freely confess that
I had to stare at Sean's fix for a few minutes before I figured out what
was going on.  Instead, the point I am trying to make is that carefully
constructed tests can serve as tireless and accurate code reviewers.
This won't ever replace actual code review, but my experience indicates
that it will help find more bugs more quickly and more easily.

							Thanx, Paul

> Paolo
> 
> > And again, thank you very much!!!
> >
> >                                                         Thanx, Paul
> >
> > > > diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> > > > index a08f794a0e79..92d5a3464cb2 100644
> > > > --- a/arch/x86/events/intel/core.c
> > > > +++ b/arch/x86/events/intel/core.c
> > > > @@ -4056,7 +4056,7 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
> > > >         arr[global_ctrl] = (struct perf_guest_switch_msr){
> > > >                 .msr = MSR_CORE_PERF_GLOBAL_CTRL,
> > > >                 .host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask,
> > > > -               .guest = intel_ctrl & (~cpuc->intel_ctrl_host_mask | ~pebs_mask),
> > > > +               .guest = intel_ctrl & ~(cpuc->intel_ctrl_host_mask | pebs_mask),
> > > >         };
> > > >
> > > >         if (!x86_pmu.pebs)
> > > >
> >
> 




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux