Re: [PATCH v2 3/3] qemu: add memfd source type

Marc-André Lureau <marcandre.lureau@xxxxxxxxxx> · Wed, 19 Sep 2018 18:54:23 +0400

Hi

On Wed, Sep 19, 2018 at 5:58 PM Michal Privoznik <mprivozn@xxxxxxxxxx> wrote:
>
> On 09/19/2018 12:03 PM, Marc-André Lureau wrote:
> > Hi
> >
> > On Wed, Sep 19, 2018 at 1:41 PM Michal Privoznik <mprivozn@xxxxxxxxxx> wrote:
> >>
> >> On 09/17/2018 03:14 PM, marcandre.lureau@xxxxxxxxxx wrote:
> >>> From: Marc-André Lureau <marcandre.lureau@xxxxxxxxxx>
> >>>
> >>> Add a new memoryBacking source type "memfd", supported by QEMU (when
> >>> the apability is available).
> >>>
> >>> A memfd is a specialized anonymous memory kind. As such, an anonymous
> >>> source type could be automatically using a memfd. However, there are
> >>> some complications when migrating from different memory backends in
> >>> qemu (mainly due to the internal object naming at this point, but
> >>> there could be more). For now, it is simpler and safer to simply
> >>> introduce a new source type "memfd". Eventually, the "anonymous" type
> >>> could learn to use memfd transparently in a seperate change.
> >>>
> >>> The main benefits are that it doesn't need to create filesystem files,
> >>> and it also enforces sealing, providing a bit more safety.
> >>>
> >>> Signed-off-by: Marc-André Lureau <marcandre.lureau@xxxxxxxxxx>
> >>> ---
> >>>  docs/formatdomain.html.in                     |  9 +--
> >>>  docs/schemas/domaincommon.rng                 |  1 +
> >>>  src/conf/domain_conf.c                        |  3 +-
> >>>  src/conf/domain_conf.h                        |  1 +
> >>>  src/qemu/qemu_command.c                       | 69 +++++++++++++------
> >>>  src/qemu/qemu_domain.c                        | 12 +++-
> >>>  .../memfd-memory-numa.x86_64-latest.args      | 34 +++++++++
> >>>  tests/qemuxml2argvdata/memfd-memory-numa.xml  | 36 ++++++++++
> >>>  tests/qemuxml2argvtest.c                      |  2 +
> >>>  9 files changed, 140 insertions(+), 27 deletions(-)
> >>>  create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.x86_64-latest.args
> >>>  create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.xml
> >>>
> >>> diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
> >>> index 1f12ab5b42..eeee1f6d40 100644
> >>> --- a/docs/formatdomain.html.in
> >>> +++ b/docs/formatdomain.html.in
> >>> @@ -1099,7 +1099,7 @@
> >>>      &lt;/hugepages&gt;
> >>>      &lt;nosharepages/&gt;
> >>>      &lt;locked/&gt;
> >>> -    &lt;source type="file|anonymous"/&gt;
> >>> +    &lt;source type="file|anonymous|memfd"/&gt;
> >>
> >> I'm sorry but I do not think this is the way we should go. This
> >> effectively avoids libvirt making the decision and exposes the backend
> >> used directly. This puts unnecessary burden on mgmt applications because
> >> they have to make yet another decision (track another domain attribute).
> >>
> >> IIUC, memfd is like memory-backend-file and -ram combined. It can do
> >> hugepages or just plain malloc(). Therefore it should be our first
> >> choice for freshly started domains. And only if qemu doesn't support it
> >> we should fall back to either -file or -ram backends.
> >
> > memory-backend-memfd doesn't replace either -file or -ram though. It's
> > a specialized anonymous memory kind, linux-only atm, and not widely
> > available.
>
> Well, neither libvirt nor qemu really support hugepages on anything else
> than linux.
>
> Nor it ever will? Because if we merge these patches and expose it in
> domain XML, there is no turning back. We can't stop supporting it.
>
> >
> > -file should be used for nvram or complex hugepage/numa setup for ex.
>
> How come? I can see .host-nodes and .policy attributes for -memfd
> backend too. Sure, nvram is special, but for plain hugepages use case
> -file and -memfd are interchangeable, aren't they?

Sorry, I think I misunderstood the problem then. The qemu mbind()
might do all the work.

David, didn't you point out limitation of -memfd compared to -file for
NUMA setup?

>
> -object memory-backend-memfd,id=ram-node0,\
> hugetlb=yes,hugetlbsize=2097152,\
> share=yes,size=15032385536,host-nodes=3,policy=preferred
>
> -object memory-backend-file,id=ram-node0,\
> path=/path/to/2M/hugetlfs,\
> size=15032385536,host-nodes=3,policy=preferred
>
>
> And for -ram there is no difference from usage/libvirt POV.
>
> -object memory-backend-memfd,id=ram-node0,\
> share=yes,size=15032385536,host-nodes=3,policy=preferred
>
> -object memory-backend-ram,id=ram-node0,\
> size=15032385536,host-nodes=3,policy=preferred
>
>
> >
> > But it's legitimate that a VM user request memfd to be used.
> >
> > The point of this patch is not to say that we shouldn't try to use
> > memfd when possible, but rather let the user request specifically
> > memfd, for security reasons for example. If the setup cannot be
> > satisfied with -memfd, the user should get an error.
>
> What security reasons do you have in mind?

grow/shrink sealing (and avoiding somewhat hazardous file system operations).

>
> >
> >>
> >> This means we have to track what backend the domain was started with so
> >> that we preserve that on migration (although, the fact that these
> >> backends are not interchangeable makes me question 'backend' in their
> >> name :-P). For that we can use status/migration XML as I suggested earlier.
> >>
> >> Once again, status XML is not editable by user [*] and is used solely by
> >> libvirtd to store runtime information for a running domain (and backend
> >> used falls into that category).
> >
> > Why not do this transparent memfd-usage in a seperate series?
>
> Depends what we want libvirt to be. If we want it to be mere XML->qemu
> cmd line generator, then we can expose all qemu settings as they are. If
> we want it to have some logic built in (so that mgmt applications can
> offload some decisions to it), then we can't expose all qemu settings.
>
> I my ideal world, I'd like to tell libvirt "I want a machine that uses
> hugepages of this size" and let libvirt figure out the best command line
> to fulfil my request (either use -file or -memfd or even -ram + -mem-path).
>
> On the other hand, I don't want to discourage you from posting patches,
> so this is the point where I will no longer object. I pointed out my
> objections enough :-)

I see the benefit in using memfd whenever possible. But I also see a
benefit in being able to request its usage explcitely. That's why I
think the 2 approaches are compatible.

Thanks!

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list