Re: [PATCH v3 00/22] Improve scalability of KVM + userfaultfd live migration via annotated memory faults.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 4, 2023 at 12:09 PM Peter Xu <peterx@xxxxxxxxxx> wrote:
>
> On Wed, May 03, 2023 at 07:45:28PM -0400, Peter Xu wrote:
> > On Wed, May 03, 2023 at 02:42:35PM -0700, Sean Christopherson wrote:
> > > On Wed, May 03, 2023, Peter Xu wrote:
> > > > Oops, bounced back from the list..
> > > >
> > > > Forward with no attachment this time - I assume the information is still
> > > > enough in the paragraphs even without the flamegraphs.
> > >
> > > The flamegraphs are definitely useful beyond what is captured here.  Not sure
> > > how to get them accepted on the list though.
> >
> > Trying again with google drive:
> >
> > single uffd:
> > https://drive.google.com/file/d/1bYVYefIRRkW8oViRbYv_HyX5Zf81p3Jl/view
> >
> > 32 uffds:
> > https://drive.google.com/file/d/1T19yTEKKhbjU9G2FpANIvArSC61mqqtp/view
> >
> > >
> > > > > From what I got there, vmx_vcpu_load() gets more highlights than the
> > > > > spinlocks. I think that's the tlb flush broadcast.
> > >
> > > No, it's KVM dealing with the vCPU being migrated to a different pCPU.  The
> > > smp_call_function_single() that shows up is from loaded_vmcs_clear() and is
> > > triggered when KVM needs to VMCLEAR the VMCS on the _previous_ pCPU (yay for the
> > > VMCS caches not being coherent).
> > >
> > > Task migration can also trigger IBPB (if mitigations are enabled), and also does
> > > an "all contexts" INVEPT, i.e. flushes all TLB entries for KVM's MMU.
> > >
> > > Can you trying 1:1 pinning of vCPUs to pCPUs?  That _should_ eliminate the
> > > vmx_vcpu_load_vmcs() hotspot, and for large VMs is likely represenative of a real
> > > world configuration.
> >
> > Yes it does went away:
> >
> > https://drive.google.com/file/d/1ZFhWnWjoU33Lxy43jTYnKFuluo4zZArm/view
> >
> > With pinning vcpu threads only (again, over 40 hard cores/threads):
> >
> > ./demand_paging_test -b 512M -u MINOR -s shmem -v 32 -c 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32
> >
> > It seems to me for some reason the scheduler ate more than I expected..
> > Maybe tomorrow I can try two more things:

I pulled in your patch adding the -c flag, and confirmed that it
doesn't seem to make a huge difference to the self test's
numbers/scalability. The percpu paging rate actually seems a bit
lower, going 117-103-77-55-18-9k for 1-32 vcpus

> >   - Do cpu isolations, and
> >   - pin reader threads too (or just leave the readers on housekeeping cores)
>
> I gave it a shot by isolating 32 cores and split into two groups, 16 for
> uffd threads and 16 for vcpu threads.  I got similiar results and I don't
> see much changed.
>
> I think it's possible it's just reaching the limit of my host since it only
> got 40 cores anyway.  Throughput never hits over 350K faults/sec overall.
>
> I assume this might not be the case for Anish if he has a much larger host.
> So we can have similar test carried out and see how that goes.  I think the
> idea is making sure vcpu load overhead during sched-in is ruled out, then
> see whether it can keep scaling with more cores.

Peter, I'm afraid that isolating cores and splitting them into groups
is new to me. Do you mind explaining exactly what you did here?

Also, I finally got some of my own perf traces for the self test: [1]
shows what happens with 32 vCPUs faulting on a single uffd with 32
reader threads, with the contention clearly being a huge issue, and
[2] shows the effect of demand paging through memory faults on that
configuration. Unfortunately the export-to-svg functionality on our
internal tool seems broken, so I could only grab pngs :(

[1] https://drive.google.com/file/d/1YWiZTjb2FPmqj0tkbk4cuH0Oq8l65nsU/view?usp=drivesdk
[2] https://drive.google.com/file/d/1P76_6SSAHpLxNgDAErSwRmXBLkuDeFoA/view?usp=drivesdk




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux