* Marc-André Lureau (marcandre.lureau@xxxxxxxxxx) wrote: > Hi > > On Wed, Sep 19, 2018 at 5:58 PM Michal Privoznik <mprivozn@xxxxxxxxxx> wrote: > > > > On 09/19/2018 12:03 PM, Marc-André Lureau wrote: > > > Hi > > > > > > On Wed, Sep 19, 2018 at 1:41 PM Michal Privoznik <mprivozn@xxxxxxxxxx> wrote: > > >> > > >> On 09/17/2018 03:14 PM, marcandre.lureau@xxxxxxxxxx wrote: > > >>> From: Marc-André Lureau <marcandre.lureau@xxxxxxxxxx> > > >>> > > >>> Add a new memoryBacking source type "memfd", supported by QEMU (when > > >>> the apability is available). > > >>> > > >>> A memfd is a specialized anonymous memory kind. As such, an anonymous > > >>> source type could be automatically using a memfd. However, there are > > >>> some complications when migrating from different memory backends in > > >>> qemu (mainly due to the internal object naming at this point, but > > >>> there could be more). For now, it is simpler and safer to simply > > >>> introduce a new source type "memfd". Eventually, the "anonymous" type > > >>> could learn to use memfd transparently in a seperate change. > > >>> > > >>> The main benefits are that it doesn't need to create filesystem files, > > >>> and it also enforces sealing, providing a bit more safety. > > >>> > > >>> Signed-off-by: Marc-André Lureau <marcandre.lureau@xxxxxxxxxx> > > >>> --- > > >>> docs/formatdomain.html.in | 9 +-- > > >>> docs/schemas/domaincommon.rng | 1 + > > >>> src/conf/domain_conf.c | 3 +- > > >>> src/conf/domain_conf.h | 1 + > > >>> src/qemu/qemu_command.c | 69 +++++++++++++------ > > >>> src/qemu/qemu_domain.c | 12 +++- > > >>> .../memfd-memory-numa.x86_64-latest.args | 34 +++++++++ > > >>> tests/qemuxml2argvdata/memfd-memory-numa.xml | 36 ++++++++++ > > >>> tests/qemuxml2argvtest.c | 2 + > > >>> 9 files changed, 140 insertions(+), 27 deletions(-) > > >>> create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.x86_64-latest.args > > >>> create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.xml > > >>> > > >>> diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in > > >>> index 1f12ab5b42..eeee1f6d40 100644 > > >>> --- a/docs/formatdomain.html.in > > >>> +++ b/docs/formatdomain.html.in > > >>> @@ -1099,7 +1099,7 @@ > > >>> </hugepages> > > >>> <nosharepages/> > > >>> <locked/> > > >>> - <source type="file|anonymous"/> > > >>> + <source type="file|anonymous|memfd"/> > > >> > > >> I'm sorry but I do not think this is the way we should go. This > > >> effectively avoids libvirt making the decision and exposes the backend > > >> used directly. This puts unnecessary burden on mgmt applications because > > >> they have to make yet another decision (track another domain attribute). > > >> > > >> IIUC, memfd is like memory-backend-file and -ram combined. It can do > > >> hugepages or just plain malloc(). Therefore it should be our first > > >> choice for freshly started domains. And only if qemu doesn't support it > > >> we should fall back to either -file or -ram backends. > > > > > > memory-backend-memfd doesn't replace either -file or -ram though. It's > > > a specialized anonymous memory kind, linux-only atm, and not widely > > > available. > > > > Well, neither libvirt nor qemu really support hugepages on anything else > > than linux. > > > > Nor it ever will? Because if we merge these patches and expose it in > > domain XML, there is no turning back. We can't stop supporting it. > > > > > > > > -file should be used for nvram or complex hugepage/numa setup for ex. > > > > How come? I can see .host-nodes and .policy attributes for -memfd > > backend too. Sure, nvram is special, but for plain hugepages use case > > -file and -memfd are interchangeable, aren't they? > > Sorry, I think I misunderstood the problem then. The qemu mbind() > might do all the work. > > David, didn't you point out limitation of -memfd compared to -file for > NUMA setup? <thinks> I think we came to the conclusion they're mostly the same, but with the gotcha that it's harder to control allocation with memfd. I think for example you can create a fixed size hugetlbfs mount and put a set of VMs in it and no they're limited to that size. I think you can do similar things with /dev/shm like mounts. Dave > > > > -object memory-backend-memfd,id=ram-node0,\ > > hugetlb=yes,hugetlbsize=2097152,\ > > share=yes,size=15032385536,host-nodes=3,policy=preferred > > > > -object memory-backend-file,id=ram-node0,\ > > path=/path/to/2M/hugetlfs,\ > > size=15032385536,host-nodes=3,policy=preferred > > > > > > And for -ram there is no difference from usage/libvirt POV. > > > > -object memory-backend-memfd,id=ram-node0,\ > > share=yes,size=15032385536,host-nodes=3,policy=preferred > > > > -object memory-backend-ram,id=ram-node0,\ > > size=15032385536,host-nodes=3,policy=preferred > > > > > > > > > > But it's legitimate that a VM user request memfd to be used. > > > > > > The point of this patch is not to say that we shouldn't try to use > > > memfd when possible, but rather let the user request specifically > > > memfd, for security reasons for example. If the setup cannot be > > > satisfied with -memfd, the user should get an error. > > > > What security reasons do you have in mind? > > grow/shrink sealing (and avoiding somewhat hazardous file system operations). > > > > > > > > >> > > >> This means we have to track what backend the domain was started with so > > >> that we preserve that on migration (although, the fact that these > > >> backends are not interchangeable makes me question 'backend' in their > > >> name :-P). For that we can use status/migration XML as I suggested earlier. > > >> > > >> Once again, status XML is not editable by user [*] and is used solely by > > >> libvirtd to store runtime information for a running domain (and backend > > >> used falls into that category). > > > > > > Why not do this transparent memfd-usage in a seperate series? > > > > Depends what we want libvirt to be. If we want it to be mere XML->qemu > > cmd line generator, then we can expose all qemu settings as they are. If > > we want it to have some logic built in (so that mgmt applications can > > offload some decisions to it), then we can't expose all qemu settings. > > > > I my ideal world, I'd like to tell libvirt "I want a machine that uses > > hugepages of this size" and let libvirt figure out the best command line > > to fulfil my request (either use -file or -memfd or even -ram + -mem-path). > > > > On the other hand, I don't want to discourage you from posting patches, > > so this is the point where I will no longer object. I pointed out my > > objections enough :-) > > I see the benefit in using memfd whenever possible. But I also see a > benefit in being able to request its usage explcitely. That's why I > think the 2 approaches are compatible. > > Thanks! -- Dr. David Alan Gilbert / dgilbert@xxxxxxxxxx / Manchester, UK -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list