Hi On Wed, Sep 19, 2018 at 5:58 PM Michal Privoznik <mprivozn@xxxxxxxxxx> wrote: > > On 09/19/2018 12:03 PM, Marc-André Lureau wrote: > > Hi > > > > On Wed, Sep 19, 2018 at 1:41 PM Michal Privoznik <mprivozn@xxxxxxxxxx> wrote: > >> > >> On 09/17/2018 03:14 PM, marcandre.lureau@xxxxxxxxxx wrote: > >>> From: Marc-André Lureau <marcandre.lureau@xxxxxxxxxx> > >>> > >>> Add a new memoryBacking source type "memfd", supported by QEMU (when > >>> the apability is available). > >>> > >>> A memfd is a specialized anonymous memory kind. As such, an anonymous > >>> source type could be automatically using a memfd. However, there are > >>> some complications when migrating from different memory backends in > >>> qemu (mainly due to the internal object naming at this point, but > >>> there could be more). For now, it is simpler and safer to simply > >>> introduce a new source type "memfd". Eventually, the "anonymous" type > >>> could learn to use memfd transparently in a seperate change. > >>> > >>> The main benefits are that it doesn't need to create filesystem files, > >>> and it also enforces sealing, providing a bit more safety. > >>> > >>> Signed-off-by: Marc-André Lureau <marcandre.lureau@xxxxxxxxxx> > >>> --- > >>> docs/formatdomain.html.in | 9 +-- > >>> docs/schemas/domaincommon.rng | 1 + > >>> src/conf/domain_conf.c | 3 +- > >>> src/conf/domain_conf.h | 1 + > >>> src/qemu/qemu_command.c | 69 +++++++++++++------ > >>> src/qemu/qemu_domain.c | 12 +++- > >>> .../memfd-memory-numa.x86_64-latest.args | 34 +++++++++ > >>> tests/qemuxml2argvdata/memfd-memory-numa.xml | 36 ++++++++++ > >>> tests/qemuxml2argvtest.c | 2 + > >>> 9 files changed, 140 insertions(+), 27 deletions(-) > >>> create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.x86_64-latest.args > >>> create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.xml > >>> > >>> diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in > >>> index 1f12ab5b42..eeee1f6d40 100644 > >>> --- a/docs/formatdomain.html.in > >>> +++ b/docs/formatdomain.html.in > >>> @@ -1099,7 +1099,7 @@ > >>> </hugepages> > >>> <nosharepages/> > >>> <locked/> > >>> - <source type="file|anonymous"/> > >>> + <source type="file|anonymous|memfd"/> > >> > >> I'm sorry but I do not think this is the way we should go. This > >> effectively avoids libvirt making the decision and exposes the backend > >> used directly. This puts unnecessary burden on mgmt applications because > >> they have to make yet another decision (track another domain attribute). > >> > >> IIUC, memfd is like memory-backend-file and -ram combined. It can do > >> hugepages or just plain malloc(). Therefore it should be our first > >> choice for freshly started domains. And only if qemu doesn't support it > >> we should fall back to either -file or -ram backends. > > > > memory-backend-memfd doesn't replace either -file or -ram though. It's > > a specialized anonymous memory kind, linux-only atm, and not widely > > available. > > Well, neither libvirt nor qemu really support hugepages on anything else > than linux. > > Nor it ever will? Because if we merge these patches and expose it in > domain XML, there is no turning back. We can't stop supporting it. > > > > > -file should be used for nvram or complex hugepage/numa setup for ex. > > How come? I can see .host-nodes and .policy attributes for -memfd > backend too. Sure, nvram is special, but for plain hugepages use case > -file and -memfd are interchangeable, aren't they? Sorry, I think I misunderstood the problem then. The qemu mbind() might do all the work. David, didn't you point out limitation of -memfd compared to -file for NUMA setup? > > -object memory-backend-memfd,id=ram-node0,\ > hugetlb=yes,hugetlbsize=2097152,\ > share=yes,size=15032385536,host-nodes=3,policy=preferred > > -object memory-backend-file,id=ram-node0,\ > path=/path/to/2M/hugetlfs,\ > size=15032385536,host-nodes=3,policy=preferred > > > And for -ram there is no difference from usage/libvirt POV. > > -object memory-backend-memfd,id=ram-node0,\ > share=yes,size=15032385536,host-nodes=3,policy=preferred > > -object memory-backend-ram,id=ram-node0,\ > size=15032385536,host-nodes=3,policy=preferred > > > > > > But it's legitimate that a VM user request memfd to be used. > > > > The point of this patch is not to say that we shouldn't try to use > > memfd when possible, but rather let the user request specifically > > memfd, for security reasons for example. If the setup cannot be > > satisfied with -memfd, the user should get an error. > > What security reasons do you have in mind? grow/shrink sealing (and avoiding somewhat hazardous file system operations). > > > > >> > >> This means we have to track what backend the domain was started with so > >> that we preserve that on migration (although, the fact that these > >> backends are not interchangeable makes me question 'backend' in their > >> name :-P). For that we can use status/migration XML as I suggested earlier. > >> > >> Once again, status XML is not editable by user [*] and is used solely by > >> libvirtd to store runtime information for a running domain (and backend > >> used falls into that category). > > > > Why not do this transparent memfd-usage in a seperate series? > > Depends what we want libvirt to be. If we want it to be mere XML->qemu > cmd line generator, then we can expose all qemu settings as they are. If > we want it to have some logic built in (so that mgmt applications can > offload some decisions to it), then we can't expose all qemu settings. > > I my ideal world, I'd like to tell libvirt "I want a machine that uses > hugepages of this size" and let libvirt figure out the best command line > to fulfil my request (either use -file or -memfd or even -ram + -mem-path). > > On the other hand, I don't want to discourage you from posting patches, > so this is the point where I will no longer object. I pointed out my > objections enough :-) I see the benefit in using memfd whenever possible. But I also see a benefit in being able to request its usage explcitely. That's why I think the 2 approaches are compatible. Thanks! -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list