On Sat, Oct 3, 2020 at 1:31 AM Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > On Fri, Oct 02, 2020 at 08:16:48PM +0200, Daniel Vetter wrote: > > On Fri, Oct 2, 2020 at 8:06 PM Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > > On Fri, Oct 02, 2020 at 07:53:03PM +0200, Daniel Vetter wrote: > > > > For $reasons I've stumbled over this code and I'm not sure the change > > > > to the new gup functions in 55a650c35fea ("mm/gup: frame_vector: > > > > convert get_user_pages() --> pin_user_pages()") was entirely correct. > > > > > > > > This here is used for long term buffers (not just quick I/O) like > > > > RDMA, and John notes this in his patch. But I thought the rule for > > > > these is that they need to add FOLL_LONGTERM, which John's patch > > > > didn't do. > > > > > > > > There is already a dax specific check (added in b7f0554a56f2 ("mm: > > > > fail get_vaddr_frames() for filesystem-dax mappings")), so this seems > > > > like the prudent thing to do. > > > > > > > > Signed-off-by: Daniel Vetter <daniel.vetter@xxxxxxxxx> > > > > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > > > > Cc: John Hubbard <jhubbard@xxxxxxxxxx> > > > > Cc: Jérôme Glisse <jglisse@xxxxxxxxxx> > > > > Cc: Jan Kara <jack@xxxxxxx> > > > > Cc: Dan Williams <dan.j.williams@xxxxxxxxx> > > > > Cc: linux-mm@xxxxxxxxx > > > > Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx > > > > Cc: linux-samsung-soc@xxxxxxxxxxxxxxx > > > > Cc: linux-media@xxxxxxxxxxxxxxx > > > > Hi all, > > > > > > > > I stumbled over this and figured typing this patch can't hurt. Really > > > > just to maybe learn a few things about how gup/pup is supposed to be > > > > used (we have a bit of that in drivers/gpu), this here isn't really > > > > ralated to anything I'm doing. > > > > > > FOLL_FORCE is a pretty big clue it should be FOLL_LONGTERM, IMHO > > > > Since you're here ... I've noticed that ib sets FOLL_FORCE when the ib > > verb access mode indicates possible writes. I'm not really clear on > > why FOLL_WRITE isn't enough any why you need to be able to write > > through a vma that's write protected currently. > > Ah, FOLL_FORCE | FOLL_WRITE means *read* confusingly enough, and the > only reason you'd want this version for read is if you are doing > longterm stuff. I wrote about this recently: > > https://lore.kernel.org/linux-mm/20200928235739.GU9916@xxxxxxxx/ Ah, so essentially it serves as a FOLL_GET_COW_ISSUES_OUT_OF_MY_WAY. I think documentation for this, and/or just automatically adding the flag set combination would be really good. But I see John is already on top of that it seems. Thanks for the explainer. > > > Since every driver does this wrong anything that uses this is creating > > > terrifying security issues. > > > > > > IMHO this whole API should be deleted :( > > > > Yeah that part I just tried to conveniently ignore. I guess this dates > > back to a time when ioremaps where at best fixed, and there wasn't > > anything like a gpu driver dynamically managing vram around, resulting > > in random entirely unrelated things possibly being mapped to that set > > of pfns. > > No, it was always wrong. Prior to GPU like cases the lifetime of the > PTE was tied to the vma and when the vma becomes free the driver can > move the things in the PTEs to 'free'. Easy to trigger use-after-free > issues and devices like RDMA have security contexts attached to these > PTEs so it becomes a serious security bug to do something like this. > > > The underlying follow_pfn is also used in other places within > > drivers/media, so this doesn't seem to be an accident, but actually > > intentional. > > Looking closely, there are very few users, most *seem* pointless, but > maybe there is a crazy reason? > > The sequence > get_vaddr_frames(); > frame_vector_to_pages(); > sg_alloc_table_from_pages(); > > Should be written > pin_user_pages_fast(FOLL_LONGTERM); > sg_alloc_table_from_pages() Ok, that takes care of exynos and habanalabs. I'll try and wack together a patch for exynos, that driver is a bit special - first arm soc driver and we merged it fully well aware that it's full of warts, just as a show to make it clear that drivers/gpu is also interested in small gpu drivers ... > There is some 'special' code in frame_vector_to_pages() that tries to > get a struct page for things from a VM_IO or VM_PFNMAP... > > Oh snap, that is *completely* broken! If the first VMA is IO|PFNMAP > then get_vaddr_frames() iterates over all VMAs in the range, of any > kind and extracts the PTEs then blindly references them! This means it > can be used to use after free normal RAM struct pages!! Gah! > > Wow. Okay. That has to go. > > So, I *think* we can assume there is no sane cases where > frame_vector_to_pages() succeeds but pin_user_pages() wasn't called. > > That means the users here: > - habanalabs: Hey Oded can you fix this up? > > - gpu/exynos: Daniel can you get someone there to stop using it? > > - media/videobuf via vb2_dma_sg_get_userptr() > > Should all be switched to the standard pin_user_pages sequence > above. > > That leaves the only interesting places as vb2_dc_get_userptr() and > vb2_vmalloc_get_userptr() which both completely fail to follow the > REQUIRED behavior in the function's comment about checking PTEs. It > just DMA maps them. Badly broken. > > Guessing this hackery is for some embedded P2P DMA transfer? Yeah, see also the follow_pfn trickery in videobuf_dma_contig_user_get(), I think this is fully intentional and userspace abi we can't break :-/ It's mildly better than just sharing phys_addr_t through userspace and blindly trusting it, which is what all the out-of-tree crap did, but only mildly. > After he three places above should use pin_user_pages_fast(), then > this whole broken API should be moved into videobuf2-memops.c and a > big fat "THIS DOESN'T WORK" stuck on it. > > videobuf2 should probably use P2P DMA buf for this instead. Yup this should be done with dma_buf instead, and v4l has that. But old uapi and all that. This is why I said we might need a new VM_DYNAMIC_PFNMAP or so, to make follow_pfn not resolve this in the case where the driver manages the underlying iomem range (or whatever it is) dynamically and moves buffer objects around, like drm drivers do. But I looked, and we've run out of vma->vm_flags :-( The other problem is that I also have no real working clue about all the VM_* flags and what they all mean, and whether drm drivers set the right ones in all cases (they probably don't, but oh well). Documentation for this stuff in headers is a bit thin at times. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch