Re: [RFC PATCH v2 33/47] userfaultfd: add UFFD_FEATURE_MINOR_HUGETLBFS_HGM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Dec 21, 2022 at 5:32 PM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote:
>
> On 12/21/22 17:10, Peter Xu wrote:
> > On Wed, Dec 21, 2022 at 01:39:39PM -0800, Mike Kravetz wrote:
> > > On 12/21/22 15:21, James Houghton wrote:
> > > > Thanks for bringing this up, Peter. I think the main reason was:
> > > > having separate UFFD_FEATUREs clearly indicates to userspace what is
> > > > and is not supported.
> > >
> > > IIRC, I think we wanted to initially limit the usage to the very
> > > specific use case (live migration).  The idea is that we could then
> > > expand usage as more use cases came to light.
> > >
> > > Another good thing is that userfaultfd has versioning built into the
> > > API.  Thus a user can determine if HGM is enabled in their running
> > > kernel.
> >
> > I don't worry much on this one, afaiu if we have any way to enable hgm then
> > the user can just try enabling it on a test vma, just like when an app
> > wants to detect whether a new madvise() is present on the current host OS.

That would be enough to test if HGM was merely present, but if
specific features like 4K UFFDIO_CONTINUEs or 4K UFFDIO_WRITEPROTECTs
were available. You could always check these by making a HugeTLB VMA
and setting it up correctly for userfaultfd/etc., but that's a little
messy.

> >
> > Besides, I'm wondering whether something like /sys/kernel/vm/hugepages/hgm
> > would work too.

I'm not opposed to this.

> >
> > >
> > > > For UFFDIO_WRITEPROTECT, a user could remap huge pages into smaller
> > > > pages by issuing a high-granularity UFFDIO_WRITEPROTECT. That isn't
> > > > allowed as of this patch series, but it could be allowed in the
> > > > future. To add support in the same way as this series, we would add
> > > > another feature, say UFFD_FEATURE_WP_HUGETLBFS_HGM. I agree that
> > > > having to add another feature isn't great; is this what you're
> > > > concerned about?
> > > >
> > > > Considering MADV_ENABLE_HUGETLB...
> > > > 1. If a user provides this, then the contract becomes: "the kernel may
> > > > allow UFFDIO_CONTINUE and UFFDIO_WRITEPROTECT for HugeTLB at
> > > > high-granularities, provided the support exists", but it becomes
> > > > unclear to userspace to know what's supported and what isn't.
> > > > 2. We would then need to keep track if a user explicitly enabled it,
> > > > or if it got enabled automatically in response to memory poison, for
> > > > example. Not a big problem, just a complication. (Otherwise, if HGM
> > > > got enabled for poison, suddenly userspace would be allowed to do
> > > > things it wasn't allowed to do before.)
> >
> > We could alternatively have two flags for each vma: (a) hgm_advised and (b)
> > hgm_enabled.  (a) always sets (b) but not vice versa.  We can limit poison
> > to set (b) only.  For this patchset, it can be all about (a).

My thoughts exactly. :)

> >
> > > > 3. This API makes sense for enabling HGM for something outside of
> > > > userfaultfd, like MADV_DONTNEED.
> > >
> > > I think #3 is key here.  Once we start applying HGM to things outside
> > > userfaultfd, then more thought will be required on APIs.  The API is
> > > somewhat limited by design until the basic functionality is in place.
> >
> > Mike, could you elaborate what's the major concern of having hgm used
> > outside uffd and live migration use cases?
> >
> > I feel like I miss something here.  I can understand we want to limit the
> > usage only when the user specifies using hgm because we want to keep the
> > old behavior intact.  However if we want another way to enable hgm it'll
> > still need one knob anyway even outside uffd, and I thought that'll service
> > the same purpose, or maybe not?
>
> I am not opposed to using hgm outside the use cases targeted by this series.
>
> It seems that when we were previously discussing the API we spent a bunch of
> time going around in circles trying to get the API correct.  That is expected
> as it is more difficult to take all users/uses/abuses of the API into account.
>
> Since the initial use case was fairly limited, it seemed like a good idea to
> limit the API to userfaultfd.  In this way we could focus on the underlying
> code/implementation and then expand as needed.  Of course, with an eye on
> anything that may be a limiting factor in the future.
>
> I was not aware of the uffd-wp use case, and am more than happy to discuss
> expanding the API.

So considering two API choices:

1. What we have now: UFFD_FEATURE_MINOR_HUGETLBFS_HGM for
UFFDIO_CONTINUE, and later UFFD_FEATURE_WP_HUGETLBFS_HGM for
UFFDIO_WRITEPROTECT. For MADV_DONTNEED, we could just suddenly start
allowing high-granularity choices (not sure if this is bad; we started
allowing it for HugeTLB recently with no other API change, AFAIA).

2. MADV_ENABLE_HGM or something similar. The changes to
UFFDIO_CONTINUE/UFFDIO_WRITEPROTECT/MADV_DONTNEED come automatically,
provided they are implemented.

I don't mind one way or the other. Peter, I assume you prefer #2.
Mike, what about you? If we decide on something other than #1, I'll
make the change before sending v1 out.

- James




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux