On Thu, May 12, 2016 at 04:00:29PM +0000, Mooney, Sean K wrote: > > > Today it is possible to use Libvirt to spawn a vm without hugepage > > > memory and a file descriptor backed memdev Via the use of the > > qemu:commandline element. > > > > > > <qemu:commandline> > > > <qemu:arg value='-object'/> > > > <qemu:arg value='memory-backend-file,id=mem,size=1024M,mem- > > path=/var/lib/libvirt/qemu,share=on'/> > > > <qemu:arg value='-numa'/> > > > <qemu:arg value='node,memdev=mem'/> > > > <qemu:arg value='-mem-prealloc'/> > > > </qemu:commandline> > > > > > > I created a proof of concept patch to nova to demonstrate that this > > > works however to support this usecase in Nova a new xml element is > > required. > > > https://review.openstack.org/#/c/309565/1 > > > > > > I would like to propose the introduction of a new subelemnt to the > > > memorybacking element to request file discrptro backed memory > > > > > > <memoryBacking> > > > <filedescriptor size_mb="1024" path="/var/lib/libvirt/qemu" > > > prealloc="true" shared="on" /> </memoryBacking> > > > > Specifying a size is not required - we already know how big memory must > > be for the guest. > > > > We already have a memAccess='shared' attribute against the <numa> > > element that is used to determine if the underlying memory should be > > setup as shared. We could define a further element that lets us control > > memory access mode for guests without NUMA topology specified. > [Mooney, Sean K] hi yes the reason I added the shared attribute was to cater for > The case of guest without numa topology. For guest with numa topology I agree that > Using the memAcess='shared' on the cell is better for consistency with hugepage memory. > > > <memoryBacking> > > <access mode="shared"/> > > </memoryBacking> > > > > For huge pages it seems we unconditionally pass --mem-prealloc. I'm > > thinking we could perhaps make that configurable via an element > > > > > > <memoryBacking> > > <allocation mode="immediate|ondemand"/> > > </memoryBacking> > > > > to control use of -mem-prealloc or not. > [Mooney, Sean K] for the vhost user case the the mem-prealloc is required > Because you are basically doing dma so you really want memory to allocated. > Generically though from a Libvirt point of view I do think It makes sense for this > To be configurable to allow over subscript of memory for higher density. > > > > So all that remains is a way to request file based backing of RAM. As > > with huge pages, I think we should hide the actual path from the user. > > We should just use /dev/shm as the backing for non-hugepage RAM. For > > this we could define something like > > > > <memoryBacking> > > <source type="file|anonymous"/> > > </memoryBacking> > > > [Mooney, Sean K] for some reason when I used /dev/shm I could only boot one instance at a time. > that was my first choice but maybe we would have to create a file per instance under /dev/shm to make it work. QEMU should create the file itself - its not different to our use of hugetlbfs in fact. Possibly you hit a limit on amount of memory allowed to be used via /dev/shm - iirc the mount poin tis limited to 50% by default If you use /var/lib/libvirt/ as the location you get a real file backed by disk, so akin to putting the VM on swap IIUC ! > > Putting that all together, to get what you want we'd have > > > > <memoryBacking> > > <source type="file"/> > > <access mode="shared"/> > > <allocation mode="immediate"/> > > </memoryBacking> > > > [Mooney, Sean K] > Yes this seems like it would be a clean way to address this use case. > Can you guage how small/large of a change this would be. Its been > A while since I worked with c directly but if you could point me in the > Right direction in the Libvirt codebase I would be happy to look at > creating an RFC patch. First there's defining the XML extensions - needs docs/schemas/domaincommon.rng and src/conf/domain_conf.{c,h} to be changed. Then there's wiring up QEMU XML -> ARGV conversion - src/qemu/qemu_command.c and adding test cases in tests/qemuxml2argvtest.c > From a nova side assuming Libvirt was extended for this feature should > I open a blueprint to extend the existing guest memory backing support > In parallel to the Libvirt implementation or wait until after it is > support in Libvirt to start the Nova discussion? In either case I think > we agree that any support in nova Would Depend on Libvirt support to be > accepted in upstream nova. You're going to hit the deadline for approval of Newton specs in Nova fairly soon, and unless the libvirt impl is done before then, I think it is unlikely you'd get a spec approved. So by all means work on this in parallel, but be realistic about chances of approval in Nova for this cycle. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list