Re: [PATCH 09/19] mm: New follow_pfnmap API

Peter Xu <peterx@xxxxxxxxxx> · Thu, 15 Aug 2024 14:52:56 -0400

On Thu, Aug 15, 2024 at 02:24:45PM -0300, Jason Gunthorpe wrote:
> On Thu, Aug 15, 2024 at 01:21:01PM -0400, Peter Xu wrote:
> > > Why? Either the function only returns PFN map no-struct page things or
> > > it returns struct page stuff too, in which case why bother to check
> > > the VMA flags if the caller already has to be correct for struct page
> > > backed results?
> > > 
> > > This function is only safe to use under the proper locking, and under
> > > those rules it doesn't matter at all what the result is..
> > 
> > Do you mean we should drop the PFNMAP|IO check?
> 
> Yeah
> 
> >  I didn't see all the
> > callers to say that they won't rely on proper failing of !PFNMAP&&!IO vmas
> > to work alright.  So I assume we should definitely keep them around.
> 
> But as before, if we care about this we should be using vm_normal_page
> as that is sort of abusing the PFNMAP flags.

I can't say it's abusing..  Taking access_remote_vm() as example again, it
can go back as far as 2008 with Rik's commit here:

    commit 28b2ee20c7cba812b6f2ccf6d722cf86d00a84dc
    Author: Rik van Riel <riel@xxxxxxxxxx>
    Date:   Wed Jul 23 21:27:05 2008 -0700

    access_process_vm device memory infrastructure

So it starts with having GUP failing pfnmaps first for remote vm access, as
what we also do right now with check_vma_flags(), then this whole walker is
a remedy for that.

It isn't used at all for normal VMAs, unless it's a private pfnmap mapping
which should be extremely rare, or if it's IO+!PFNMAP, which is a world I
am not familiar with..

In all cases, I hope we can still leave this alone in the huge pfnmap
effort, as they do not yet to be closely relevant.  From that POV, this
patch as simple as "teach follow_pte() to know huge mappings", while it's
just that we can't modify on top when the old interface won't work when
stick with pte_t.  Most of the rest was inherited from follow_pte();
there're still some trivial changes elsewhere, but here on the vma flag
check we stick the same with old.

> 
> > >   Any physical address obtained through this API is only valid while
> > >   the @follow_pfnmap_args. Continuing to use the address after end(),
> > >   without some other means to synchronize with page table updates
> > >   will create a security bug.
> > 
> > Some misuse on wordings here (e.g. we don't return PA but PFN), and some
> > sentence doesn't seem to be complete.. but I think I get the "scary" part
> > of it.  How about this, appending the scary part to the end?
> > 
> >  * During the start() and end() calls, the results in @args will be valid
> >  * as proper locks will be held.  After the end() is called, all the fields
> >  * in @follow_pfnmap_args will be invalid to be further accessed.  Further
> >  * use of such information after end() may require proper synchronizations
> >  * by the caller with page table updates, otherwise it can create a
> >  * security bug.
> 
> I would specifically emphasis that the pfn may not be used after
> end. That is the primary mistake people have made.
> 
> They think it is a PFN so it is safe.

I understand your concern. It's just that it seems still legal to me to use
it as long as proper action is taken.

I hope "require proper synchronizations" would be the best way to phrase
this matter, but maybe you have even better suggestion to put this, which
I'm definitely open to that too.

> 
> > It sounds like we need some mmu notifiers when mapping the IOMMU pgtables,
> > as long as there's MMIO-region / P2P involved.  It'll make sure when
> > tearing down the BAR mappings, the devices will at least see the same view
> > as the processors.
> 
> I think the mmu notifiers can trigger too often for this to be
> practical for DMA :(

I guess the DMAs are fine as normally the notifier will be no-op, as long
as the BAR enable/disable happens rare.  But yeah, I see you point, and
that is probably a concern if those notifier needs to be kicked off and
walk a bunch of MMIO regions, even if 99% of the cases it'll do nothing.

Thanks,

-- 
Peter Xu