Re: How to determine the backing host physical memory for a given guest ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05/09/2012 08:46 AM, Avi Kivity wrote:
On 05/09/2012 04:05 PM, Chegu Vinod wrote:
Hello,

On an 8 socket Westmere host I am attempting to run a single guest and
characterize the virtualization overhead for a system intensive
workload (AIM7-high_systime) as the size of the guest scales (10way/64G,
20way/128G, ... 80way/512G).

To do some comparisons between the native vs. guest runs. I have
been using "numactl" to control the cpu node&  memory node bindings for
the qemu instance.  For larger guest sizes I end up binding across multiple
localities. for e.g. a 40 way guest :

numactl --cpunodebind=0,1,2,3  --membind=0,1,2,3  \
qemu-system-x86_64 -smp 40 -m 262144 \
<....>

I understand that actual mappings from a guest virtual address to host physical
address could change.

Is there a way to determine [at a given instant] which host's NUMA node is
providing the backing physical memory for the active guest's kernel and
also for the the apps actively running in the guest ?

Guessing that there is a better way (some tool available?) than just
diff'ng the per node memory usage...from the before and after output of
"numactl --hardware" on the host.


Not sure if that's what you want, but there's Documentation/vm/pagemap.txt.


You can look at /proc/<pid>/numa_maps and see all the mappings for the qemu process. There should be one really large mapping for the guest memory, and in that line a number of dirty pages list potentially for each NUMA node. This will tell you how much from each node, but not specifically "which page is mapped where".

Keep in mind with the current numactl you are using, you will likely not get the benefits of NUMA enhancements found in the linux kernel from your guest (or host). There are a couple reasons: (1) your guest does not have a NUMA topology defined (based on what I see from the qemu command above), so it will not do anything special based on the host topology. Also, things that are broken down per-NUMA-node like some spin-locks and sched-domains are now system-wide/flat. This is a big deal for scheduler and other things like kmem allocation. With a single 80way VM with no NUMA, you will likely have massive spin-lock contention on some workloads. (2) Once the VM does have NUMA toplogy (via qemu -numa), one still cannot manually set mempolicy for a portion of the VM memory that represents each NUMA node in the VM (or have this done automatically with something like autoNUMA). Therefore, it's difficult to forcefully map each of the VM's node's memory to the corresponding host node.

There are a some things you can do to mitigate some of this. Definitely define the VM to match the NUMA topology found on the host. That will at least allow good scaling wrt locks and scheduler in the guest. As for getting memory placement close (a page in VM node x actually resides in host node x), you have to rely on vcpu pinning + guest NUMA topology, combined with default mempolicy in the guest and host. As pages are faulted in the guest, the hope is that the vcpu which did the faulting is running in the right node (guest and host), its guest OS mempolicy ensures this page is to be allocated in the guest local node, and that allocation cause a fault in qemu, which is -also- running on the -host- node X. The vcpu pinning is critical to get qemu to fault that memory to the correct node. Make sure you do not use numactl for any of this. I would suggest using libvirt and define the vcpu-pinning and the numa topology in the XML.

-Andrew Theurer

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux