Re: [RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 02, 2012 at 07:56:58PM +0100, Daniel P. Berrange wrote:
> On Mon, Jul 02, 2012 at 03:06:32PM -0300, Eduardo Habkost wrote:
> > Resending series, after fixing some coding style issues. Does anybody has any
> > feedback about this proposal?
> > 
> > Changes v1 -> v2:
> >  - Coding style fixes
> > 
> > Original cover letter:
> > 
> > I was investigating if there are any mechanisms that allow manually pinning of
> > guest RAM to specific host NUMA nodes, in the case of multi-node KVM guests, and
> > noticed that -mem-path could be used for that, except that it currently removes
> > any files it creates (using mkstemp()) immediately, not allowing numactl to be
> > used on the backing files, as a result. This patches add a -keep-mem-path-files
> > option to make QEMU create the files inside -mem-path with more predictable
> > names, and not remove them after creation.
> > 
> > Some previous discussions about the subject, for reference:
> >  - Message-ID: <1281534738-8310-1-git-send-email-andre.przywara@xxxxxxx>
> >    http://article.gmane.org/gmane.comp.emulators.kvm.devel/57684
> >  - Message-ID: <4C7D7C2A.7000205@xxxxxxxxxxxxx>
> >    http://article.gmane.org/gmane.comp.emulators.kvm.devel/58835
> > 
> > A more recent thread can be found at:
> >  - Message-ID: <20111029184502.GH11038@xxxxxxxxxx>
> >    http://article.gmane.org/gmane.comp.emulators.qemu/123001
> > 
> > Note that this is just a mechanism to facilitate manual static binding using
> > numactl on hugetlbfs later, for optimization. This may be especially useful for
> > single large multi-node guests use-cases (and, of course, has to be used with
> > care).
> > 
> > I don't know if it is a good idea to use the memory range names as a publicly-
> > visible interface. Another option may be to use a single file instead, and mmap
> > different regions inside the same file for each memory region. I an open to
> > comments and suggestions.
> > 
> > Example (untested) usage to bind manually each half of the RAM of a guest to a
> > different NUMA node:
> > 
> >  $ qemu-system-x86_64 [...] -m 2048 -smp 4 \
> >    -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
> >    -mem-prealloc -keep-mem-path-files -mem-path /mnt/hugetlbfs/FOO
> >  $ numactl --offset=1G --length=1G --membind=1 --file /mnt/hugetlbfs/FOO/pc.ram
> >  $ numactl --offset=0  --length=1G --membind=2 --file /mnt/hugetlbfs/FOO/pc.ram
> 
> I'd suggest that instead of making the memory file name into a
> public ABI QEMU needs to maintain, QEMU could expose the info
> via a monitor command. eg
> 
>    $ qemu-system-x86_64 [...] -m 2048 -smp 4 \
>      -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
>      -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \
>      -monitor stdio
>    (qemu) info mem-nodes
>     node0: file=/proc/self/fd/3, offset=0G, length=1G
>     node1: file=/proc/self/fd/3, offset=1G, length=1G
> 
> This example takes advantage of the fact that with Linux, you can
> still access a deleted file via /proc/self/fd/NNN, which AFAICT,
> would avoid the need for a --keep-mem-path-files.

I like the suggestion.

But other processes still need to be able to open those files if we want
to do anything useful with them. In this case, I guess it's better to
let QEMU itself build a "/proc/<getpid()>/fd/<fd>" string instead of
using "/proc/self" and forcing the client to find out what's the right
PID?

Anyway, even if we want to avoid file-descriptor and /proc tricks, we
can still use the interface you suggest. Then we wouldn't need to have
any filename assumptions: the filenames could be completly random, as
they would be reported using the new monitor command.

> 
> By returning info via a monitor command you also avoid hardcoding
> the use of 1 single file for all of memory. You also avoid hardcoding
> the fact that QEMU stores the nodes in contiguous order inside the
> node. eg QEMU could easily return data like this
> 
> 
>    $ qemu-system-x86_64 [...] -m 2048 -smp 4 \
>      -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
>      -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \
>      -monitor stdio
>    (qemu) info mem-nodes
>     node0: file=/proc/self/fd/3, offset=0G, length=1G
>     node1: file=/proc/self/fd/4, offset=0G, length=1G
> 
> or more ingeneous options

Sounds good.

-- 
Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux