On Thu, Jan 5, 2023 at 7:29 AM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 05.01.23 11:18, James Houghton wrote: > > Issuing ioctl(MADV_SPLIT) on a HugeTLB address range will enable > > HugeTLB HGM. MADV_SPLIT was chosen for the name so that this API can be > > applied to non-HugeTLB memory in the future, if such an application is > > to arise. > > > > MADV_SPLIT provides several API changes for some syscalls on HugeTLB > > address ranges: > > 1. UFFDIO_CONTINUE is allowed for MAP_SHARED VMAs at PAGE_SIZE > > alignment. > > 2. read()ing a page fault event from a userfaultfd will yield a > > PAGE_SIZE-rounded address, instead of a huge-page-size-rounded > > address (unless UFFD_FEATURE_EXACT_ADDRESS is used). > > > > There is no way to disable the API changes that come with issuing > > MADV_SPLIT. MADV_COLLAPSE can be used to collapse high-granularity page > > table mappings that come from the extended functionality that comes with > > using MADV_SPLIT. > > > > For post-copy live migration, the expected use-case is: > > 1. mmap(MAP_SHARED, some_fd) primary mapping > > 2. mmap(MAP_SHARED, some_fd) alias mapping > > 3. MADV_SPLIT the primary mapping > > 4. UFFDIO_REGISTER/etc. the primary mapping > > 5. Copy memory contents into alias mapping and UFFDIO_CONTINUE the > > corresponding PAGE_SIZE sections in the primary mapping. > > > > More API changes may be added in the future. > > > > Signed-off-by: James Houghton <jthoughton@xxxxxxxxxx> > > --- > > arch/alpha/include/uapi/asm/mman.h | 2 ++ > > arch/mips/include/uapi/asm/mman.h | 2 ++ > > arch/parisc/include/uapi/asm/mman.h | 2 ++ > > arch/xtensa/include/uapi/asm/mman.h | 2 ++ > > include/linux/hugetlb.h | 2 ++ > > include/uapi/asm-generic/mman-common.h | 2 ++ > > mm/hugetlb.c | 3 +-- > > mm/madvise.c | 26 ++++++++++++++++++++++++++ > > 8 files changed, 39 insertions(+), 2 deletions(-) > > > > diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h > > index 763929e814e9..7a26f3648b90 100644 > > --- a/arch/alpha/include/uapi/asm/mman.h > > +++ b/arch/alpha/include/uapi/asm/mman.h > > @@ -78,6 +78,8 @@ > > > > #define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ > > > > +#define MADV_SPLIT 26 /* Enable hugepage high-granularity APIs */ > > I think we should make a split more generic, such that it also splits > (pte-maps) a THP. Has that been discussed? Thanks James / David. MADV_SPLIT for THP has come up a few times; firstly, during the initial RFC about hugepage collapse in process context, as the natural inverse operation required by a generic userspace-managed hugepage daemon, the second -- which is more immediately practical -- is to avoid stranding THPs on the deferred split queue (and thus still incurring the memcg charge) for too long [1]. However, its exact semantics / API have yet to be discussed / flushed out (though I'm planning to do exactly this in the near-term). Just as James has co-opted MADV_COLLAPSE for hugetlb, we can co-opt MADV_SPLIT for THP, when the time comes -- which I think makes a lot of sense. Hopefully I can get my ducks in order to start a discussion about this eminently. Best, Zach [1] https://lore.kernel.org/linux-mm/YZ9kUD5AG6inbUEg@xz-m1.local/ > -- > Thanks, > > David / dhildenb >