On Mon, Jul 02, 2012 at 03:06:32PM -0300, Eduardo Habkost wrote: > Resending series, after fixing some coding style issues. Does anybody has any > feedback about this proposal? > > Changes v1 -> v2: > - Coding style fixes > > Original cover letter: > > I was investigating if there are any mechanisms that allow manually pinning of > guest RAM to specific host NUMA nodes, in the case of multi-node KVM guests, and > noticed that -mem-path could be used for that, except that it currently removes > any files it creates (using mkstemp()) immediately, not allowing numactl to be > used on the backing files, as a result. This patches add a -keep-mem-path-files > option to make QEMU create the files inside -mem-path with more predictable > names, and not remove them after creation. > > Some previous discussions about the subject, for reference: > - Message-ID: <1281534738-8310-1-git-send-email-andre.przywara@xxxxxxx> > http://article.gmane.org/gmane.comp.emulators.kvm.devel/57684 > - Message-ID: <4C7D7C2A.7000205@xxxxxxxxxxxxx> > http://article.gmane.org/gmane.comp.emulators.kvm.devel/58835 > > A more recent thread can be found at: > - Message-ID: <20111029184502.GH11038@xxxxxxxxxx> > http://article.gmane.org/gmane.comp.emulators.qemu/123001 > > Note that this is just a mechanism to facilitate manual static binding using > numactl on hugetlbfs later, for optimization. This may be especially useful for > single large multi-node guests use-cases (and, of course, has to be used with > care). > > I don't know if it is a good idea to use the memory range names as a publicly- > visible interface. Another option may be to use a single file instead, and mmap > different regions inside the same file for each memory region. I an open to > comments and suggestions. > > Example (untested) usage to bind manually each half of the RAM of a guest to a > different NUMA node: > > $ qemu-system-x86_64 [...] -m 2048 -smp 4 \ > -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \ > -mem-prealloc -keep-mem-path-files -mem-path /mnt/hugetlbfs/FOO > $ numactl --offset=1G --length=1G --membind=1 --file /mnt/hugetlbfs/FOO/pc.ram > $ numactl --offset=0 --length=1G --membind=2 --file /mnt/hugetlbfs/FOO/pc.ram I'd suggest that instead of making the memory file name into a public ABI QEMU needs to maintain, QEMU could expose the info via a monitor command. eg $ qemu-system-x86_64 [...] -m 2048 -smp 4 \ -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \ -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \ -monitor stdio (qemu) info mem-nodes node0: file=/proc/self/fd/3, offset=0G, length=1G node1: file=/proc/self/fd/3, offset=1G, length=1G This example takes advantage of the fact that with Linux, you can still access a deleted file via /proc/self/fd/NNN, which AFAICT, would avoid the need for a --keep-mem-path-files. By returning info via a monitor command you also avoid hardcoding the use of 1 single file for all of memory. You also avoid hardcoding the fact that QEMU stores the nodes in contiguous order inside the node. eg QEMU could easily return data like this $ qemu-system-x86_64 [...] -m 2048 -smp 4 \ -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \ -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \ -monitor stdio (qemu) info mem-nodes node0: file=/proc/self/fd/3, offset=0G, length=1G node1: file=/proc/self/fd/4, offset=0G, length=1G or more ingeneous options Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html