On Thu, Sep 21, 2023 at 11:17 AM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 21.09.23 20:04, Suren Baghdasaryan wrote: > > On Thu, Sep 14, 2023 at 6:45 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > >> > >> On 14.09.23 20:43, David Hildenbrand wrote: > >>> On 14.09.23 20:11, Matthew Wilcox wrote: > >>>> On Thu, Sep 14, 2023 at 08:26:12AM -0700, Suren Baghdasaryan wrote: > >>>>> +++ b/include/linux/userfaultfd_k.h > >>>>> @@ -93,6 +93,23 @@ extern int mwriteprotect_range(struct mm_struct *dst_mm, > >>>>> extern long uffd_wp_range(struct vm_area_struct *vma, > >>>>> unsigned long start, unsigned long len, bool enable_wp); > >>>>> > >>>>> +/* remap_pages */ > >>>>> +extern void double_pt_lock(spinlock_t *ptl1, spinlock_t *ptl2); > >>>>> +extern void double_pt_unlock(spinlock_t *ptl1, spinlock_t *ptl2); > >>>>> +extern ssize_t remap_pages(struct mm_struct *dst_mm, > >>>>> + struct mm_struct *src_mm, > >>>>> + unsigned long dst_start, > >>>>> + unsigned long src_start, > >>>>> + unsigned long len, __u64 flags); > >>>>> +extern int remap_pages_huge_pmd(struct mm_struct *dst_mm, > >>>>> + struct mm_struct *src_mm, > >>>>> + pmd_t *dst_pmd, pmd_t *src_pmd, > >>>>> + pmd_t dst_pmdval, > >>>>> + struct vm_area_struct *dst_vma, > >>>>> + struct vm_area_struct *src_vma, > >>>>> + unsigned long dst_addr, > >>>>> + unsigned long src_addr); > >>>> > >>>> Drop the 'extern' markers from function declarations. > >>>> > >>>>> +int remap_pages_huge_pmd(struct mm_struct *dst_mm, > >>>>> + struct mm_struct *src_mm, > >>>>> + pmd_t *dst_pmd, pmd_t *src_pmd, > >>>>> + pmd_t dst_pmdval, > >>>>> + struct vm_area_struct *dst_vma, > >>>>> + struct vm_area_struct *src_vma, > >>>>> + unsigned long dst_addr, > >>>>> + unsigned long src_addr) > >>>>> +{ > >>>>> + pmd_t _dst_pmd, src_pmdval; > >>>>> + struct page *src_page; > >>>>> + struct anon_vma *src_anon_vma, *dst_anon_vma; > >>>>> + spinlock_t *src_ptl, *dst_ptl; > >>>>> + pgtable_t pgtable; > >>>>> + struct mmu_notifier_range range; > >>>>> + > >>>>> + src_pmdval = *src_pmd; > >>>>> + src_ptl = pmd_lockptr(src_mm, src_pmd); > >>>>> + > >>>>> + BUG_ON(!pmd_trans_huge(src_pmdval)); > >>>>> + BUG_ON(!pmd_none(dst_pmdval)); > >>>>> + BUG_ON(!spin_is_locked(src_ptl)); > >>>>> + mmap_assert_locked(src_mm); > >>>>> + mmap_assert_locked(dst_mm); > >>>>> + BUG_ON(src_addr & ~HPAGE_PMD_MASK); > >>>>> + BUG_ON(dst_addr & ~HPAGE_PMD_MASK); > >>>>> + > >>>>> + src_page = pmd_page(src_pmdval); > >>>>> + BUG_ON(!PageHead(src_page)); > >>>>> + BUG_ON(!PageAnon(src_page)); > >>>> > >>>> Better to add a src_folio = page_folio(src_page); > >>>> and then folio_test_anon() here. > >>>> > >>>>> + if (unlikely(page_mapcount(src_page) != 1)) { > >>>> > >>>> Brr, this is going to miss PTE mappings of this folio. I think you > >>>> actually want folio_mapcount() instead, although it'd be more efficient > >>>> to look at folio->_entire_mapcount == 1 and _nr_pages_mapped == 0. > >>>> Not wure what a good name for that predicate would be. > >>> > >>> We have > >>> > >>> * It only works on non shared anonymous pages because those can > >>> * be relocated without generating non linear anon_vmas in the rmap > >>> * code. > >>> * > >>> * It provides a zero copy mechanism to handle userspace page faults. > >>> * The source vma pages should have mapcount == 1, which can be > >>> * enforced by using madvise(MADV_DONTFORK) on src vma. > >>> > >>> Use PageAnonExclusive(). As long as KSM is not involved and you don't > >>> use fork(), that flag should be good enough for that use case here. > >>> > >> ... and similarly don't do any of that swapcount stuff and only check if > >> the swap pte is anon exclusive. > > > > I'm preparing v2 and this is the only part left for me to address but > > I'm not clear how. David, could you please clarify how I should be > > checking swap pte to be exclusive without swapcount? > > If you have a real swp pte (not a non-swap pte like migration entries) > you should be able to just use pte_swp_exclusive(). Got it. Thanks! > > -- > Cheers, > > David / dhildenb >