Re: Unable to start VM with 5.10-rc3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Zdenek,

I'm working on reproducing the issue. I don't have access to a CPU
without EPT, but I tried turning off EPT on a Skylake and I think I
reproduced the issue, but wasn't able to confirm in the logs.

If you were operating without EPT I assume the guest was in non-paging
mode to get into direct_page_fault in the first place. I would still
have expected the root HPA to be valid unless...

Ah, if you're operating with PAE, then the root hpa will be valid but
not have a shadow page associated with it, as it is set to
__pa(vcpu->arch.mmu->pae_root) in mmu_alloc_direct_roots.
In that case, I can see why we get a NULL pointer dereference in
is_tdp_mmu_root.

I will send out a patch that should fix this if the issue is as
described above. I don't have hardware to test this on, but if you
don't mind applying the patch and checking it, that would be awesome.

Ben

On Wed, Nov 11, 2020 at 3:09 AM Zdenek Kaspar <zkaspar82@xxxxxxxxx> wrote:
>
> Hi, I'm sure my bisect has nothing to do with KVM,
> because it was quick shot between -rc1 and previous release.
>
> This old CPU doesn't have EPT (see attached file)
>
> ./run_tests.sh
> FAIL apic-split (timeout; duration=90s)
> FAIL ioapic-split (timeout; duration=90s)
> FAIL apic (timeout; duration=30)
> ... ^C
> few RIP is_tdp_mmu_root observed in dmesg
>
> Z.
>
> On Tue, 10 Nov 2020 17:13:21 -0800
> Ben Gardon <bgardon@xxxxxxxxxx> wrote:
>
> > Hi Zdenek,
> >
> > That crash is most likely the result of a missing check for an invalid
> > root HPA or NULL shadow page in is_tdp_mmu_root, which could have
> > prevented the NULL pointer dereference.
> > However, I'm not sure how a vCPU got to that point in the page fault
> > handler with a bad EPT root page.
> >
> > I see VMX in your list of flags, is your machine 64 bit with EPT or
> > some other configuration?
> >
> > I'm surprised you are finding your machine unable to boot for
> > bisecting. Do you know if it's crashing in the same spot or somewhere
> > else? I wouldn't expect the KVM page fault handler to run as part of
> > boot.
> >
> > I will send out a patch first thing tomorrow morning (PST) to WARN
> > instead of crashing with a NULL pointer dereference. Are you able to
> > reproduce the issue with any KVM selftest?
> >
> > Ben
> >
> >
> > On Tue, Nov 10, 2020 at 7:24 AM Zdenek Kaspar <zkaspar82@xxxxxxxxx>
> > wrote:
> > >
> > > Hi,
> > >
> > > attached file is result from today's linux-master (with fixes
> > > for 5.10-rc4) when I try to start VM on older machine:
> > >
> > > model name      : Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz
> > > flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
> > > pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe
> > > syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl
> > > cpuid aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16
> > > xtpr pdcm lahf_lm pti tpr_shadow dtherm vmx flags       :
> > > tsc_offset vtpr
> > >
> > > I did quick check with 5.9 (distro kernel) and it works,
> > > but VM performance seems extremely impacted. 5.8 works fine.
> > >
> > > Back to 5.10 issue: it's problematic since 5.10-rc1 and I have no
> > > luck with bisecting (machine doesn't boot).
> > >
> > > TIA, Z.
>



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux