Re: [Linaro-mm-sig] [PATCH 1/2] dma-buf: Require VM_PFNMAP vma for mmap

Thomas Hellström (Intel) <thomas_os@xxxxxxxxxxxx> · Thu, 11 Mar 2021 11:22:06 +0100

On 3/1/21 3:09 PM, Daniel Vetter wrote:
On Mon, Mar 1, 2021 at 11:17 AM Christian König
<christian.koenig@xxxxxxx> wrote:


Am 01.03.21 um 10:21 schrieb Thomas Hellström (Intel):
On 3/1/21 10:05 AM, Daniel Vetter wrote:
On Mon, Mar 01, 2021 at 09:39:53AM +0100, Thomas Hellström (Intel)
wrote:
Hi,

On 3/1/21 9:28 AM, Daniel Vetter wrote:
On Sat, Feb 27, 2021 at 9:06 AM Thomas Hellström (Intel)
<thomas_os@xxxxxxxxxxxx> wrote:
On 2/26/21 2:28 PM, Daniel Vetter wrote:
So I think it stops gup. But I haven't verified at all. Would be
good
if Christian can check this with some direct io to a buffer in
system
memory.
Hmm,

Docs (again vm_normal_page() say)

     * VM_MIXEDMAP mappings can likewise contain memory with or
without "struct
     * page" backing, however the difference is that _all_ pages
with a struct
     * page (that is, those where pfn_valid is true) are refcounted
and
considered
     * normal pages by the VM. The disadvantage is that pages are
refcounted
     * (which can be slower and simply not an option for some PFNMAP
users). The
     * advantage is that we don't have to follow the strict
linearity rule of
     * PFNMAP mappings in order to support COWable mappings.

but it's true __vm_insert_mixed() ends up in the insert_pfn()
path, so
the above isn't really true, which makes me wonder if and in that
case
why there could any longer ever be a significant performance
difference
between MIXEDMAP and PFNMAP.
Yeah it's definitely confusing. I guess I'll hack up a patch and see
what sticks.

BTW regarding the TTM hugeptes, I don't think we ever landed that
devmap
hack, so they are (for the non-gup case) relying on
vma_is_special_huge(). For the gup case, I think the bug is still
there.
Maybe there's another devmap hack, but the ttm_vm_insert functions do
use PFN_DEV and all that. And I think that stops gup_fast from trying
to find the underlying page.
-Daniel
Hmm perhaps it might, but I don't think so. The fix I tried out was
to set

PFN_DEV | PFN_MAP for huge PTEs which causes pfn_devmap() to be
true, and
then

follow_devmap_pmd()->get_dev_pagemap() which returns NULL and
gup_fast()
backs off,

in the end that would mean setting in stone that "if there is a huge
devmap
page table entry for which we haven't registered any devmap struct
pages
(get_dev_pagemap returns NULL), we should treat that as a "special"
huge
page table entry".

  From what I can tell, all code calling get_dev_pagemap() already
does that,
it's just a question of getting it accepted and formalizing it.
Oh I thought that's already how it works, since I didn't spot anything
else that would block gup_fast from falling over. I guess really would
need some testcases to make sure direct i/o (that's the easiest to test)
fails like we expect.
Yeah, IIRC the "| PFN_MAP" is the missing piece for TTM huge ptes.
Otherwise pmd_devmap() will not return true and since there is no
pmd_special() things break.
Is that maybe the issue we have seen with amdgpu and huge pages?
Yeah, essentially when you have a hugepte inserted by ttm, and it
happens to point at system memory, then gup will work on that. And
create all kinds of havoc.

Apart from that I'm lost guys, that devmap and gup stuff is not
something I have a good knowledge of apart from a one mile high view.
I'm not really better, hence would be good to do a testcase and see.
This should provoke it:
- allocate nicely aligned bo in system memory
- mmap, again nicely aligned to 2M
- do some direct io from a filesystem into that mmap, that should trigger gup
- before the gup completes free the mmap and bo so that ttm recycles
the pages, which should trip up on the elevated refcount. If you wait
until the direct io is completely, then I think nothing bad can be
observed.

Ofc if your amdgpu+hugepte issue is something else, then maybe we have
another issue.

Also usual caveat: I'm not an mm hacker either, so might be completely wrong.
-Daniel

So I did the following quick experiment on vmwgfx, and it turns out that 
with it,
fast gup never succeeds. Without the "| PFN_MAP", it typically succeeds

I should probably craft an RFC formalizing this.

/Thomas

diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c 
b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 6dc96cf66744..72b6fb17c984 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -195,6 +195,7 @@ static vm_fault_t ttm_bo_vm_insert_huge(struct 
vm_fault *vmf,
        pfn_t pfnt;
        struct ttm_tt *ttm = bo->ttm;
        bool write = vmf->flags & FAULT_FLAG_WRITE;
+       struct dev_pagemap *pagemap;

        /* Fault should not cross bo boundary. */
        page_offset &= ~(fault_page_size - 1);
@@ -210,6 +211,17 @@ static vm_fault_t ttm_bo_vm_insert_huge(struct 
vm_fault *vmf,
        if ((pfn & (fault_page_size - 1)) != 0)
                goto out_fallback;

+       /*
+        * Huge entries must be special, that is marking them as devmap
+        * with no backing device map range. If there is a backing
+        * range, Don't insert a huge entry.
+        */
+       pagemap = get_dev_pagemap(pfn, NULL);
+       if (pagemap) {
+               put_dev_pagemap(pagemap);
+               goto out_fallback;
+       }
+
        /* Check that memory is contiguous. */
        if (!bo->mem.bus.is_iomem) {
                for (i = 1; i < fault_page_size; ++i) {
@@ -223,7 +235,7 @@ static vm_fault_t ttm_bo_vm_insert_huge(struct 
vm_fault *vmf,
                }
        }

-       pfnt = __pfn_to_pfn_t(pfn, PFN_DEV);
+       pfnt = __pfn_to_pfn_t(pfn, PFN_DEV | PFN_MAP);
        if (fault_page_size == (HPAGE_PMD_SIZE >> PAGE_SHIFT))
                ret = vmf_insert_pfn_pmd_prot(vmf, pfnt, pgprot, write);
 #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
@@ -236,6 +248,21 @@ static vm_fault_t ttm_bo_vm_insert_huge(struct 
vm_fault *vmf,
        if (ret != VM_FAULT_NOPAGE)
                goto out_fallback;

+#if 1
+       {
+               int npages;
+               struct page *page;
+
+               npages = get_user_pages_fast_only(vmf->address, 1, 0, 
&page);
+               if (npages == 1) {
+                       DRM_WARN("Fast gup succeeded. Bad.\n");
+                       put_page(page);
+               } else {
+                       DRM_INFO("Fast gup failed. Good.\n");
+               }
+       }
+#endif
+
        return VM_FAULT_NOPAGE;
 out_fallback:
        count_vm_event(THP_FAULT_FALLBACK);





_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx