Re: [RFC PATCH v2 33/47] userfaultfd: add UFFD_FEATURE_MINOR_HUGETLBFS_HGM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/21/22 17:10, Peter Xu wrote:
> On Wed, Dec 21, 2022 at 01:39:39PM -0800, Mike Kravetz wrote:
> > On 12/21/22 15:21, James Houghton wrote:
> > > On Wed, Dec 21, 2022 at 2:23 PM Peter Xu <peterx@xxxxxxxxxx> wrote:
> > > >
> > > > James,
> > > >
> > > > On Wed, Nov 16, 2022 at 03:30:00PM -0800, James Houghton wrote:
> > > > > On Wed, Nov 16, 2022 at 2:28 PM Peter Xu <peterx@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > On Fri, Oct 21, 2022 at 04:36:49PM +0000, James Houghton wrote:
> > > > > > > Userspace must provide this new feature when it calls UFFDIO_API to
> > > > > > > enable HGM. Userspace can check if the feature exists in
> > > > > > > uffdio_api.features, and if it does not exist, the kernel does not
> > > > > > > support and therefore did not enable HGM.
> > > > > > >
> > > > > > > Signed-off-by: James Houghton <jthoughton@xxxxxxxxxx>
> > > > > >
> > > > > > It's still slightly a pity that this can only be enabled by an uffd context
> > > > > > plus a minor fault, so generic hugetlb users cannot directly leverage this.
> > > > >
> > > > > The idea here is that, for applications that can conceivably benefit
> > > > > from HGM, we have a mechanism for enabling it for that application. So
> > > > > this patch creates that mechanism for userfaultfd/UFFDIO_CONTINUE. I
> > > > > prefer this approach over something more general like MADV_ENABLE_HGM
> > > > > or something.
> > > >
> > > > Sorry to get back to this very late - I know this has been discussed since
> > > > the very early stage of the feature, but is there any reasoning behind?
> > > >
> > > > When I start to think seriously on applying this to process snapshot with
> > > > uffd-wp I found that the minor mode trick won't easily play - normally
> > > > that's a case where all the pages were there mapped huge, but when the app
> > > > wants UFFDIO_WRITEPROTECT it may want to remap the huge pages into smaller
> > > > pages, probably some size that the user can specify.  It'll be non-trivial
> > > > to enable HGM during that phase using MINOR mode because in that case the
> > > > pages are all mapped.
> > > >
> > > > For the long term, I am just still worried the current interface is still
> > > > not as flexible.
> > > 
> > > Thanks for bringing this up, Peter. I think the main reason was:
> > > having separate UFFD_FEATUREs clearly indicates to userspace what is
> > > and is not supported.
> > 
> > IIRC, I think we wanted to initially limit the usage to the very
> > specific use case (live migration).  The idea is that we could then
> > expand usage as more use cases came to light.
> > 
> > Another good thing is that userfaultfd has versioning built into the
> > API.  Thus a user can determine if HGM is enabled in their running
> > kernel.
> 
> I don't worry much on this one, afaiu if we have any way to enable hgm then
> the user can just try enabling it on a test vma, just like when an app
> wants to detect whether a new madvise() is present on the current host OS.
> 
> Besides, I'm wondering whether something like /sys/kernel/vm/hugepages/hgm
> would work too.
> 
> > 
> > > For UFFDIO_WRITEPROTECT, a user could remap huge pages into smaller
> > > pages by issuing a high-granularity UFFDIO_WRITEPROTECT. That isn't
> > > allowed as of this patch series, but it could be allowed in the
> > > future. To add support in the same way as this series, we would add
> > > another feature, say UFFD_FEATURE_WP_HUGETLBFS_HGM. I agree that
> > > having to add another feature isn't great; is this what you're
> > > concerned about?
> > > 
> > > Considering MADV_ENABLE_HUGETLB...
> > > 1. If a user provides this, then the contract becomes: "the kernel may
> > > allow UFFDIO_CONTINUE and UFFDIO_WRITEPROTECT for HugeTLB at
> > > high-granularities, provided the support exists", but it becomes
> > > unclear to userspace to know what's supported and what isn't.
> > > 2. We would then need to keep track if a user explicitly enabled it,
> > > or if it got enabled automatically in response to memory poison, for
> > > example. Not a big problem, just a complication. (Otherwise, if HGM
> > > got enabled for poison, suddenly userspace would be allowed to do
> > > things it wasn't allowed to do before.)
> 
> We could alternatively have two flags for each vma: (a) hgm_advised and (b)
> hgm_enabled.  (a) always sets (b) but not vice versa.  We can limit poison
> to set (b) only.  For this patchset, it can be all about (a).
> 
> > > 3. This API makes sense for enabling HGM for something outside of
> > > userfaultfd, like MADV_DONTNEED.
> > 
> > I think #3 is key here.  Once we start applying HGM to things outside
> > userfaultfd, then more thought will be required on APIs.  The API is
> > somewhat limited by design until the basic functionality is in place.
> 
> Mike, could you elaborate what's the major concern of having hgm used
> outside uffd and live migration use cases?
> 
> I feel like I miss something here.  I can understand we want to limit the
> usage only when the user specifies using hgm because we want to keep the
> old behavior intact.  However if we want another way to enable hgm it'll
> still need one knob anyway even outside uffd, and I thought that'll service
> the same purpose, or maybe not?

I am not opposed to using hgm outside the use cases targeted by this series.

It seems that when we were previously discussing the API we spent a bunch of
time going around in circles trying to get the API correct.  That is expected
as it is more difficult to take all users/uses/abuses of the API into account.

Since the initial use case was fairly limited, it seemed like a good idea to
limit the API to userfaultfd.  In this way we could focus on the underlying
code/implementation and then expand as needed.  Of course, with an eye on
anything that may be a limiting factor in the future.

I was not aware of the uffd-wp use case, and am more than happy to discuss
expanding the API.
-- 
Mike Kravetz




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux