On Thu, Mar 11, 2021 at 11:22:06AM +0100, Thomas Hellström (Intel) wrote: > > On 3/1/21 3:09 PM, Daniel Vetter wrote: > > On Mon, Mar 1, 2021 at 11:17 AM Christian König > > <christian.koenig@xxxxxxx> wrote: > > > > > > > > > Am 01.03.21 um 10:21 schrieb Thomas Hellström (Intel): > > > > On 3/1/21 10:05 AM, Daniel Vetter wrote: > > > > > On Mon, Mar 01, 2021 at 09:39:53AM +0100, Thomas Hellström (Intel) > > > > > wrote: > > > > > > Hi, > > > > > > > > > > > > On 3/1/21 9:28 AM, Daniel Vetter wrote: > > > > > > > On Sat, Feb 27, 2021 at 9:06 AM Thomas Hellström (Intel) > > > > > > > <thomas_os@xxxxxxxxxxxx> wrote: > > > > > > > > On 2/26/21 2:28 PM, Daniel Vetter wrote: > > > > > > > > > So I think it stops gup. But I haven't verified at all. Would be > > > > > > > > > good > > > > > > > > > if Christian can check this with some direct io to a buffer in > > > > > > > > > system > > > > > > > > > memory. > > > > > > > > Hmm, > > > > > > > > > > > > > > > > Docs (again vm_normal_page() say) > > > > > > > > > > > > > > > > * VM_MIXEDMAP mappings can likewise contain memory with or > > > > > > > > without "struct > > > > > > > > * page" backing, however the difference is that _all_ pages > > > > > > > > with a struct > > > > > > > > * page (that is, those where pfn_valid is true) are refcounted > > > > > > > > and > > > > > > > > considered > > > > > > > > * normal pages by the VM. The disadvantage is that pages are > > > > > > > > refcounted > > > > > > > > * (which can be slower and simply not an option for some PFNMAP > > > > > > > > users). The > > > > > > > > * advantage is that we don't have to follow the strict > > > > > > > > linearity rule of > > > > > > > > * PFNMAP mappings in order to support COWable mappings. > > > > > > > > > > > > > > > > but it's true __vm_insert_mixed() ends up in the insert_pfn() > > > > > > > > path, so > > > > > > > > the above isn't really true, which makes me wonder if and in that > > > > > > > > case > > > > > > > > why there could any longer ever be a significant performance > > > > > > > > difference > > > > > > > > between MIXEDMAP and PFNMAP. > > > > > > > Yeah it's definitely confusing. I guess I'll hack up a patch and see > > > > > > > what sticks. > > > > > > > > > > > > > > > BTW regarding the TTM hugeptes, I don't think we ever landed that > > > > > > > > devmap > > > > > > > > hack, so they are (for the non-gup case) relying on > > > > > > > > vma_is_special_huge(). For the gup case, I think the bug is still > > > > > > > > there. > > > > > > > Maybe there's another devmap hack, but the ttm_vm_insert functions do > > > > > > > use PFN_DEV and all that. And I think that stops gup_fast from trying > > > > > > > to find the underlying page. > > > > > > > -Daniel > > > > > > Hmm perhaps it might, but I don't think so. The fix I tried out was > > > > > > to set > > > > > > > > > > > > PFN_DEV | PFN_MAP for huge PTEs which causes pfn_devmap() to be > > > > > > true, and > > > > > > then > > > > > > > > > > > > follow_devmap_pmd()->get_dev_pagemap() which returns NULL and > > > > > > gup_fast() > > > > > > backs off, > > > > > > > > > > > > in the end that would mean setting in stone that "if there is a huge > > > > > > devmap > > > > > > page table entry for which we haven't registered any devmap struct > > > > > > pages > > > > > > (get_dev_pagemap returns NULL), we should treat that as a "special" > > > > > > huge > > > > > > page table entry". > > > > > > > > > > > > From what I can tell, all code calling get_dev_pagemap() already > > > > > > does that, > > > > > > it's just a question of getting it accepted and formalizing it. > > > > > Oh I thought that's already how it works, since I didn't spot anything > > > > > else that would block gup_fast from falling over. I guess really would > > > > > need some testcases to make sure direct i/o (that's the easiest to test) > > > > > fails like we expect. > > > > Yeah, IIRC the "| PFN_MAP" is the missing piece for TTM huge ptes. > > > > Otherwise pmd_devmap() will not return true and since there is no > > > > pmd_special() things break. > > > Is that maybe the issue we have seen with amdgpu and huge pages? > > Yeah, essentially when you have a hugepte inserted by ttm, and it > > happens to point at system memory, then gup will work on that. And > > create all kinds of havoc. > > > > > Apart from that I'm lost guys, that devmap and gup stuff is not > > > something I have a good knowledge of apart from a one mile high view. > > I'm not really better, hence would be good to do a testcase and see. > > This should provoke it: > > - allocate nicely aligned bo in system memory > > - mmap, again nicely aligned to 2M > > - do some direct io from a filesystem into that mmap, that should trigger gup > > - before the gup completes free the mmap and bo so that ttm recycles > > the pages, which should trip up on the elevated refcount. If you wait > > until the direct io is completely, then I think nothing bad can be > > observed. > > > > Ofc if your amdgpu+hugepte issue is something else, then maybe we have > > another issue. > > > > Also usual caveat: I'm not an mm hacker either, so might be completely wrong. > > -Daniel > > So I did the following quick experiment on vmwgfx, and it turns out that > with it, > fast gup never succeeds. Without the "| PFN_MAP", it typically succeeds > > I should probably craft an RFC formalizing this. Yeah I think that would be good. Maybe even more formalized if we also switch over to VM_PFNMAP, since afaiui these pte flags here only stop the fast gup path. And slow gup can still peak through VM_MIXEDMAP. Or something like that. Otoh your description of when it only sometimes succeeds would indicate my understanding of VM_PFNMAP vs VM_MIXEDMAP is wrong here. Christian, what's your take? -Daniel > > /Thomas > > diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c > b/drivers/gpu/drm/ttm/ttm_bo_vm.c > index 6dc96cf66744..72b6fb17c984 100644 > --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c > +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c > @@ -195,6 +195,7 @@ static vm_fault_t ttm_bo_vm_insert_huge(struct vm_fault > *vmf, > pfn_t pfnt; > struct ttm_tt *ttm = bo->ttm; > bool write = vmf->flags & FAULT_FLAG_WRITE; > + struct dev_pagemap *pagemap; > > /* Fault should not cross bo boundary. */ > page_offset &= ~(fault_page_size - 1); > @@ -210,6 +211,17 @@ static vm_fault_t ttm_bo_vm_insert_huge(struct vm_fault > *vmf, > if ((pfn & (fault_page_size - 1)) != 0) > goto out_fallback; > > + /* > + * Huge entries must be special, that is marking them as devmap > + * with no backing device map range. If there is a backing > + * range, Don't insert a huge entry. > + */ > + pagemap = get_dev_pagemap(pfn, NULL); > + if (pagemap) { > + put_dev_pagemap(pagemap); > + goto out_fallback; > + } > + > /* Check that memory is contiguous. */ > if (!bo->mem.bus.is_iomem) { > for (i = 1; i < fault_page_size; ++i) { > @@ -223,7 +235,7 @@ static vm_fault_t ttm_bo_vm_insert_huge(struct vm_fault > *vmf, > } > } > > - pfnt = __pfn_to_pfn_t(pfn, PFN_DEV); > + pfnt = __pfn_to_pfn_t(pfn, PFN_DEV | PFN_MAP); > if (fault_page_size == (HPAGE_PMD_SIZE >> PAGE_SHIFT)) > ret = vmf_insert_pfn_pmd_prot(vmf, pfnt, pgprot, write); > #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD > @@ -236,6 +248,21 @@ static vm_fault_t ttm_bo_vm_insert_huge(struct vm_fault > *vmf, > if (ret != VM_FAULT_NOPAGE) > goto out_fallback; > > +#if 1 > + { > + int npages; > + struct page *page; > + > + npages = get_user_pages_fast_only(vmf->address, 1, 0, > &page); > + if (npages == 1) { > + DRM_WARN("Fast gup succeeded. Bad.\n"); > + put_page(page); > + } else { > + DRM_INFO("Fast gup failed. Good.\n"); > + } > + } > +#endif > + > return VM_FAULT_NOPAGE; > out_fallback: > count_vm_event(THP_FAULT_FALLBACK); > > > > > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch