On Fri, 30 Mar 2018 21:42:04 +0800 Shannon Zhao <zhaoshenglong@xxxxxxxxxx> wrote: > On 2018/3/30 18:48, Marc Zyngier wrote: > > On Fri, 30 Mar 2018 17:52:07 +0800 > > Shannon Zhao <zhaoshenglong@xxxxxxxxxx> wrote: > > > >> > >> > >> On 2018/3/30 17:01, Marc Zyngier wrote: > >>> On Fri, 30 Mar 2018 09:56:10 +0800 > >>> Shannon Zhao <zhaoshenglong@xxxxxxxxxx> wrote: > >>> > >>>> On 2018/3/30 0:48, Marc Zyngier wrote: > >>>>> On Thu, 29 Mar 2018 16:27:58 +0100, > >>>>> Mark Rutland wrote: > >>>>>> > >>>>>> On Thu, Mar 29, 2018 at 11:00:24PM +0800, Shannon Zhao wrote: > >>>>>>> From: zhaoshenglong <zhaoshenglong@xxxxxxxxxx> > >>>>>>> > >>>>>>> Currently the VMID for some VM is allocated during VCPU entry/exit > >>>>>>> context and will be updated when kvm_next_vmid inversion. So this will > >>>>>>> cause the existing VMs exiting from guest and flush the tlb and icache. > >>>>>>> > >>>>>>> Also, while a platform with 8 bit VMID supports 255 VMs, it can create > >>>>>>> more than 255 VMs and if we create e.g. 256 VMs, some VMs will occur > >>>>>>> page fault since at some moment two VMs have same VMID. > >>>>>> > >>>>>> Have you seen this happen? > >>>>>> > >>>> Yes, we've started 256 VMs on D05. We saw kernel page fault in some guests. > >>> > >>> What kind of fault? Kernel configuration? Can you please share some > >>> traces with us? What is the workload? What happens if all the guests are > >>> running on the same NUMA node? > >>> > >>> We need all the information we can get. > >>> > >> All 256 VMs run without special workload. The testcase is just starting > >> 256 VMs and then shutting down them. We found several VMs will not > >> shutdown since the guest kernel crash. While if we only start 255 VMs, > >> it works well. > >> > >> We didn't run the testcase that pins all VMs to the same NUMA node. I'll > >> try. > >> > >> The fault is > >> [ 2204.633871] Unable to handle kernel NULL pointer dereference at > >> virtual address 00000008 > >> [ 2204.633875] Unable to handle kernel paging request at virtual address > >> a57f4a9095032 > >> > >> Please see the attachment for the detailed log. > > > > Thanks. It looks pretty ugly indeed. > > > > Can you please share your host kernel config (and version number -- I > > really hope the host is something more recent than the 4.1.44 stuff you > > run as a guest...)? > > > We do run a 4.1.44 host kernel but with more recently KVM module(at > least 4.14) since we backport upstream KVM ARM patches to our kernel tree. Can you please reproduce it with a mainline kernel? I'm not going to even try to reproduce this issue on a kernel that has been that heavily hacked. > See the attachment for the kernel config. > > > For the record, I'm currently running 5 concurrent Debian installs, > > each with 2 vcpus, on a 4 CPU system artificially configured to have > > only 2 bits of VMID (and thus at most 3 running VMs at any given time), > > a setup that is quite similar to what you're doing, only on a smaller > > scale. > > > > It is pretty slow (as you'd expect), but so far I haven't seen any > > issue. > > > Could you try to shutdown all VMs at the same time? The issue we > encounter happened at the shutdown step. Halted the VMs just fine, no issue. M. -- Without deviation from the norm, progress is not possible. _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm