Re: [PATCH v2 2/3] mm/hmm: allow snapshot of the special zero page

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Oct 22, 2019 at 03:05:18PM +0000, Jason Gunthorpe wrote:
> On Mon, Oct 21, 2019 at 10:45:49PM -0400, Jerome Glisse wrote:
> > On Mon, Oct 21, 2019 at 01:54:15PM -0700, Ralph Campbell wrote:
> > > 
> > > On 10/21/19 11:49 AM, Jerome Glisse wrote:
> > > > On Tue, Oct 15, 2019 at 01:48:13PM -0700, Ralph Campbell wrote:
> > > > > Allow hmm_range_fault() to return success (0) when the CPU pagetable
> > > > > entry points to the special shared zero page.
> > > > > The caller can then handle the zero page by possibly clearing device
> > > > > private memory instead of DMAing a zero page.
> > > > 
> > > > I do not understand why you are talking about DMA. GPU can work
> > > > on main memory and migrating to GPU memory is optional and should
> > > > not involve this function at all.
> > > 
> > > Good point. This is the device accessing the zero page over PCIe
> > > or another bus, not migrating a zero page to device private memory.
> > > I'll update the wording.
> > > 
> > > > > 
> > > > > Signed-off-by: Ralph Campbell <rcampbell@xxxxxxxxxx>
> > > > > Reviewed-by: Christoph Hellwig <hch@xxxxxx>
> > > > > Cc: "Jérôme Glisse" <jglisse@xxxxxxxxxx>
> > > > > Cc: Jason Gunthorpe <jgg@xxxxxxxxxxxx>
> > > > 
> > > > NAK please keep semantic or change it fully. See the alternative
> > > > below.
> > > > 
> > > > >   mm/hmm.c | 4 +++-
> > > > >   1 file changed, 3 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/mm/hmm.c b/mm/hmm.c
> > > > > index 5df0dbf77e89..f62b119722a3 100644
> > > > > +++ b/mm/hmm.c
> > > > > @@ -530,7 +530,9 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
> > > > >   			return -EBUSY;
> > > > >   	} else if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && pte_special(pte)) {
> > > > >   		*pfn = range->values[HMM_PFN_SPECIAL];
> > > > > -		return -EFAULT;
> > > > > +		if (!is_zero_pfn(pte_pfn(pte)))
> > > > > +			return -EFAULT;
> > > > > +		return 0;
> > > > 
> > > > An acceptable change would be to turn the branch into:
> > > > 	} else if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && pte_special(pte)) {
> > > > 		if (!is_zero_pfn(pte_pfn(pte))) {
> > > > 			*pfn = range->values[HMM_PFN_SPECIAL];
> > > > 			return -EFAULT;
> > > > 		}
> > > > 		/* Fall-through for zero pfn (if write was needed the above
> > > > 		 * hmm_pte_need_faul() would had catched it).
> > > > 		 */
> > > > 	}
> > > > 
> > > 
> > > Except this will return the zero pfn with no indication that it is special
> > > (i.e., doesn't have a struct page).
> > 
> > That is fine, the device driver should not do anything with it ie
> > if the device driver wanted to write then the write fault test
> > would return true and it would fault.
> > 
> > Note that driver should not dereference the struct page.
> 
> Can this thing be dma mapped for read?
> 

Yes it can, the zero page is just a regular page (AFAIK on all
architecture). So device can dma map it for read only, there is
no reason to treat it any differently.

The HMM_PTE_SPECIAL is only (as documented in the header) for
pte insert with insert_pfn or insert_page ie pte inserted in
vma with MIXED or PFNMAP flag. While HMM catch those vma early
on and backof it can still race with some driver setting the vma
flag and installing special pte afterward hence why special pte
goes through this special path.

The zero page being a special pte is just an exception ie it
is the only special pte allowed in vma that do not have MIXED or
PFNMAP flag set.

Cheers,
Jérôme






[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux