Re: [PATCH] KVM: ARM: updtae the VMID generation logic

Marc Zyngier <marc.zyngier@xxxxxxx> · Fri, 30 Mar 2018 11:48:37 +0100

On Fri, 30 Mar 2018 17:52:07 +0800
Shannon Zhao <zhaoshenglong@xxxxxxxxxx> wrote:

> 
> 
> On 2018/3/30 17:01, Marc Zyngier wrote:
> > On Fri, 30 Mar 2018 09:56:10 +0800
> > Shannon Zhao <zhaoshenglong@xxxxxxxxxx> wrote:
> > 
> >> On 2018/3/30 0:48, Marc Zyngier wrote:
> >>> On Thu, 29 Mar 2018 16:27:58 +0100,
> >>> Mark Rutland wrote:  
> >>>>
> >>>> On Thu, Mar 29, 2018 at 11:00:24PM +0800, Shannon Zhao wrote:  
> >>>>> From: zhaoshenglong <zhaoshenglong@xxxxxxxxxx>
> >>>>>
> >>>>> Currently the VMID for some VM is allocated during VCPU entry/exit
> >>>>> context and will be updated when kvm_next_vmid inversion. So this will
> >>>>> cause the existing VMs exiting from guest and flush the tlb and icache.
> >>>>>
> >>>>> Also, while a platform with 8 bit VMID supports 255 VMs, it can create
> >>>>> more than 255 VMs and if we create e.g. 256 VMs, some VMs will occur
> >>>>> page fault since at some moment two VMs have same VMID.  
> >>>>
> >>>> Have you seen this happen?
> >>>>  
> >> Yes, we've started 256 VMs on D05. We saw kernel page fault in some guests.
> > 
> > What kind of fault? Kernel configuration? Can you please share some
> > traces with us? What is the workload? What happens if all the guests are
> > running on the same NUMA node?
> > 
> > We need all the information we can get.
> > 
> All 256 VMs run without special workload. The testcase is just starting
> 256 VMs and then shutting down them. We found several VMs will not
> shutdown since the guest kernel crash. While if we only start 255 VMs,
> it works well.
> 
> We didn't run the testcase that pins all VMs to the same NUMA node. I'll
> try.
> 
> The fault is
> [ 2204.633871] Unable to handle kernel NULL pointer dereference at
> virtual address 00000008
> [ 2204.633875] Unable to handle kernel paging request at virtual address
> a57f4a9095032
> 
> Please see the attachment for the detailed log.

Thanks. It looks pretty ugly indeed.

Can you please share your host kernel config (and version number -- I
really hope the host is something more recent than the 4.1.44 stuff you
run as a guest...)?

For the record, I'm currently running 5 concurrent Debian installs,
each with 2 vcpus, on a 4 CPU system artificially configured to have
only 2 bits of VMID (and thus at most 3 running VMs at any given time),
a setup that is quite similar to what you're doing, only on a smaller
scale.

It is pretty slow (as you'd expect), but so far I haven't seen any
issue.

Thanks,

	M.
-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm