Re: [PATCH v3 00/17] KVM: Add Xen event channel acceleration

David Woodhouse <dwmw2@xxxxxxxxxxxxx> · Fri, 25 Mar 2022 19:57:41 +0000

On Fri, 2022-03-25 at 19:19 +0100, Paolo Bonzini wrote:
> On 3/3/22 16:41, David Woodhouse wrote:
> > This series adds event channel acceleration for Xen guests. In particular
> > it allows guest vCPUs to send each other IPIs without having to bounce
> > all the way out to the userspace VMM in order to do so. Likewise, the
> > Xen singleshot timer is added, and a version of SCHEDOP_poll. Those
> > major features are based on Joao and Boris' patches from 2019.
> > 
> > Cleaning up the event delivery into the vcpu_info involved using the new
> > gfn_to_pfn_cache for that, and that means I ended up doing so for *all*
> > the places the guest can have a pvclock.
> > 
> > v0: Proof-of-concept RFC
> > 
> > v1:
> >   • Drop the runstate fix which is merged now.
> >   • Add Sean's gfn_to_pfn_cache API change at the start of the series.
> >   • Add KVM self tests
> >   • Minor bug fixes
> > 
> > v2:
> >   • Drop dirty handling from gfn_to_pfn_cache
> >   • Fix !CONFIG_KVM_XEN build and duplicate call to kvm_xen_init_vcpu()
> > 
> > v3:
> >   • Add KVM_XEN_EVTCHN_RESET to clear all outbound ports.
> >   • Clean up a stray #if	1 in a part of the the test case that was once
> >     being recalcitrant.
> >   • Check kvm_xen_has_pending_events() in kvm_vcpu_has_events() and *not*
> >     kvm_xen_has_pending_timer() which is checked from elsewhere.
> >   • Fix warnings noted by the kernel test robot <
> > lkp@xxxxxxxxx
> > >:
> >      • Make kvm_xen_init_timer() static.
> >      • Make timer delta calculation use an explicit s64 to fix 32-bit build.
> 
> I've seen this:
> 
> [1790637.031490] BUG: Bad page state in process qemu-kvm  pfn:03401
> [1790637.037503] page:0000000077fc41af refcount:0 mapcount:1 
> mapping:0000000000000000 index:0x7f4ab7e01 pfn:0x3401
> [1790637.047592] head:0000000032101bf5 order:9 compound_mapcount:1 
> compound_pincount:0
> [1790637.055250] anon flags: 
> 0xfffffc009000e(referenced|uptodate|dirty|head|swapbacked|node=0|zone=1|lastcpupid=0x1fffff)
> [1790637.065949] raw: 000fffffc0000000 ffffda4b800d0001 0000000000000903 
> dead000000000200
> [1790637.073869] raw: 0000000000000100 0000000000000000 00000000ffffffff 
> 0000000000000000
> [1790637.081791] head: 000fffffc009000e dead000000000100 
> dead000000000122 ffffa0636279fb01
> [1790637.089797] head: 00000007f4ab7e00 0000000000000000 
> 00000000ffffffff 0000000000000000
> [1790637.097795] page dumped because: nonzero compound_mapcount
> [1790637.103455] Modules linked in: kvm_intel(OE) kvm(OE) overlay tun 
> tls ib_core rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd 
> grace fscache netfs rfkill sunrpc intel_rapl_msr intel_rapl_common 
> isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal 
> intel_powerclamp coretemp ipmi_ssif iTCO_wdt intel_pmc_bxt irqbypass 
> iTCO_vendor_support acpi_ipmi rapl dell_smbios ipmi_si mei_me 
> intel_cstate dcdbas ipmi_devintf i2c_i801 intel_uncore 
> dell_wmi_descriptor wmi_bmof mei lpc_ich intel_pch_thermal i2c_smbus 
> ipmi_msghandler acpi_power_meter xfs crct10dif_pclmul i40e crc32_pclmul 
> crc32c_intel megaraid_sas ghash_clmulni_intel tg3 mgag200 wmi fuse [last 
> unloaded: kvm]
> [1790637.162636] CPU: 12 PID: 3056318 Comm: qemu-kvm Kdump: loaded 
> Tainted: G        W IOE    --------- ---  5.16.0-0.rc6.41.fc36.x86_64 #1
> [1790637.174878] Hardware name: Dell Inc. PowerEdge R440/08CYF7, BIOS 
> 1.6.11 11/20/2018
> [1790637.182618] Call Trace:
> [1790637.185246]  <TASK>
> [1790637.187524]  dump_stack_lvl+0x48/0x5e
> [1790637.191373]  bad_page.cold+0x63/0x94
> [1790637.195123]  free_tail_pages_check+0xbb/0x110
> [1790637.199656]  free_pcp_prepare+0x270/0x310
> [1790637.203843]  free_unref_page+0x1d/0x120
> [1790637.207856]  kvm_gfn_to_pfn_cache_refresh+0x2c2/0x400 [kvm]
> [1790637.213662]  kvm_setup_guest_pvclock+0x4b/0x180 [kvm]
> [1790637.218913]  kvm_guest_time_update+0x26d/0x330 [kvm]
> [1790637.224080]  vcpu_enter_guest+0x31c/0x1390 [kvm]
> [1790637.228908]  kvm_arch_vcpu_ioctl_run+0x132/0x830 [kvm]
> [1790637.234254]  kvm_vcpu_ioctl+0x270/0x680 [kvm]
> 
> followed by other badness with the same call stack:
> 
> [1790637.376127] page dumped because: 
> VM_BUG_ON_PAGE(page_ref_count(page) == 0)
> 
> I am absolutely not sure that this series is the culprit in any way, but 
> anyway I'll try to reproduce (it happened at the end of a RHEL7.2 
> installation) and let you know.  If not, it is something that already 
> made its way to Linus.
> 

Hrm.... could it be a double/multiple free? This will come from
__release_gpc() which is called from the end of
kvm_gfn_to_pfn_cache_refresh() and which releases the *old* PFN.

How could we get there without... oh... could it be this?

--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -176,6 +176,7 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
                gpc->uhva = gfn_to_hva_memslot(gpc->memslot, gfn);
 
                if (kvm_is_error_hva(gpc->uhva)) {
+                       gpc->pfn = KVM_PFN_ERR_FAULT;
                        ret = -EFAULT;
                        goto out;
                }

Attachment:
smime.p7s

Description: S/MIME cryptographic signature