On Fri, Nov 06, 2020 at 11:27:59AM +0100, Daniel Vetter wrote: > On Fri, Nov 6, 2020 at 11:01 AM Daniel Vetter <daniel@xxxxxxxx> wrote: > > > > On Fri, Nov 6, 2020 at 5:08 AM John Hubbard <jhubbard@xxxxxxxxxx> wrote: > > > > > > On 11/5/20 4:49 AM, Jason Gunthorpe wrote: > > > > On Thu, Nov 05, 2020 at 10:25:24AM +0100, Daniel Vetter wrote: > > > >>> /* > > > >>> * If we can't determine whether or not a pte is special, then fail immediately > > > >>> * for ptes. Note, we can still pin HugeTLB and THP as these are guaranteed not > > > >>> * to be special. > > > >>> * > > > >>> * For a futex to be placed on a THP tail page, get_futex_key requires a > > > >>> * get_user_pages_fast_only implementation that can pin pages. Thus it's still > > > >>> * useful to have gup_huge_pmd even if we can't operate on ptes. > > > >>> */ > > > >> > > > >> We support hugepage faults in gpu drivers since recently, and I'm not > > > >> seeing a pud_mkhugespecial anywhere. So not sure this works, but probably > > > >> just me missing something again. > > > > > > > > It means ioremap can't create an IO page PUD, it has to be broken up. > > > > > > > > Does ioremap even create anything larger than PTEs? > > > > gpu drivers also tend to use vmf_insert_pfn* directly, so we can do > > on-demand paging and move buffers around. From what I glanced for > > lowest level we to the pte_mkspecial correctly (I think I convinced > > myself that vm_insert_pfn does that), but for pud/pmd levels it seems > > just yolo. > > So I dug around a bit more and ttm sets PFN_DEV | PFN_MAP to get past > the various pft_t_devmap checks (see e.g. vmf_insert_pfn_pmd_prot()). > x86-64 has ARCH_HAS_PTE_DEVMAP, and gup.c seems to handle these > specially, but frankly I got totally lost in what this does. The fact vmf_insert_pfn_pmd_prot() has all those BUG_ON's to prevent putting VM_PFNMAP pages into the page tables seems like a big red flag. The comment seems to confirm what we are talking about here: /* * If we had pmd_special, we could avoid all these restrictions, * but we need to be consistent with PTEs and architectures that * can't support a 'special' bit. */ ie without the ability to mark special we can't block fast gup and anyone who does O_DIRECT on these ranges will crash the kernel when it tries to convert a IO page into a struct page. Should be easy enough to directly test? Putting non-struct page PTEs into a VMA without setting VM_PFNMAP just seems horribly wrong to me. Jason