On Wed, Aug 7, 2019 at 10:45 AM Jason Gunthorpe <jgg@xxxxxxxxxxxx> wrote: > > On Tue, Aug 06, 2019 at 07:05:42PM +0300, Christoph Hellwig wrote: > > There is only a single place where the pgmap is passed over a function > > call, so replace it with local variables in the places where we deal > > with the pgmap. > > > > Signed-off-by: Christoph Hellwig <hch@xxxxxx> > > mm/hmm.c | 62 ++++++++++++++++++++++++-------------------------------- > > 1 file changed, 27 insertions(+), 35 deletions(-) > > > > diff --git a/mm/hmm.c b/mm/hmm.c > > index 9a908902e4cc..d66fa29b42e0 100644 > > +++ b/mm/hmm.c > > @@ -278,7 +278,6 @@ EXPORT_SYMBOL(hmm_mirror_unregister); > > > > struct hmm_vma_walk { > > struct hmm_range *range; > > - struct dev_pagemap *pgmap; > > unsigned long last; > > unsigned int flags; > > }; > > @@ -475,6 +474,7 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk, > > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > > struct hmm_vma_walk *hmm_vma_walk = walk->private; > > struct hmm_range *range = hmm_vma_walk->range; > > + struct dev_pagemap *pgmap = NULL; > > unsigned long pfn, npages, i; > > bool fault, write_fault; > > uint64_t cpu_flags; > > @@ -490,17 +490,14 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk, > > pfn = pmd_pfn(pmd) + pte_index(addr); > > for (i = 0; addr < end; addr += PAGE_SIZE, i++, pfn++) { > > if (pmd_devmap(pmd)) { > > - hmm_vma_walk->pgmap = get_dev_pagemap(pfn, > > - hmm_vma_walk->pgmap); > > - if (unlikely(!hmm_vma_walk->pgmap)) > > + pgmap = get_dev_pagemap(pfn, pgmap); > > + if (unlikely(!pgmap)) > > return -EBUSY; > > Unrelated to this patch, but what is the point of getting checking > that the pgmap exists for the page and then immediately releasing it? > This code has this pattern in several places. > > It feels racy Agree, not sure what the intent is here. The only other reason call get_dev_pagemap() is to just check in general if the pfn is indeed owned by some ZONE_DEVICE instance, but if the intent is to make sure the device is still attached/enabled that check is invalidated at put_dev_pagemap(). If it's the former case, validating ZONE_DEVICE pfns, I imagine we can do something cheaper with a helper that is on the order of the same cost as pfn_valid(). I.e. replace PTE_DEVMAP with a mem_section flag or something similar. > > > } > > pfns[i] = hmm_device_entry_from_pfn(range, pfn) | cpu_flags; > > } > > - if (hmm_vma_walk->pgmap) { > > - put_dev_pagemap(hmm_vma_walk->pgmap); > > - hmm_vma_walk->pgmap = NULL; > > Putting the value in the hmm_vma_walk would have made some sense to me > if the pgmap was not set to NULL all over the place. Then the most > xa_loads would be eliminated, as I would expect the pgmap tends to be > mostly uniform for these use cases. > > Is there some reason the pgmap ref can't be held across > faulting/sleeping? ie like below. No restriction on holding refs over faulting / sleeping. > > Anyhow, I looked over this pretty carefully and the change looks > functionally OK, I just don't know why the code is like this in the > first place. > > Reviewed-by: Jason Gunthorpe <jgg@xxxxxxxxxxxx> > > diff --git a/mm/hmm.c b/mm/hmm.c > index 9a908902e4cc38..4e30128c23a505 100644 > --- a/mm/hmm.c > +++ b/mm/hmm.c > @@ -497,10 +497,6 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk, > } > pfns[i] = hmm_device_entry_from_pfn(range, pfn) | cpu_flags; > } > - if (hmm_vma_walk->pgmap) { > - put_dev_pagemap(hmm_vma_walk->pgmap); > - hmm_vma_walk->pgmap = NULL; > - } > hmm_vma_walk->last = end; > return 0; > #else > @@ -604,10 +600,6 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, > return 0; > > fault: > - if (hmm_vma_walk->pgmap) { > - put_dev_pagemap(hmm_vma_walk->pgmap); > - hmm_vma_walk->pgmap = NULL; > - } > pte_unmap(ptep); > /* Fault any virtual address we were asked to fault */ > return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk); > @@ -690,16 +682,6 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp, > return r; > } > } > - if (hmm_vma_walk->pgmap) { > - /* > - * We do put_dev_pagemap() here and not in hmm_vma_handle_pte() > - * so that we can leverage get_dev_pagemap() optimization which > - * will not re-take a reference on a pgmap if we already have > - * one. > - */ > - put_dev_pagemap(hmm_vma_walk->pgmap); > - hmm_vma_walk->pgmap = NULL; > - } > pte_unmap(ptep - 1); > > hmm_vma_walk->last = addr; > @@ -751,10 +733,6 @@ static int hmm_vma_walk_pud(pud_t *pudp, > pfns[i] = hmm_device_entry_from_pfn(range, pfn) | > cpu_flags; > } > - if (hmm_vma_walk->pgmap) { > - put_dev_pagemap(hmm_vma_walk->pgmap); > - hmm_vma_walk->pgmap = NULL; > - } > hmm_vma_walk->last = end; > return 0; > } > @@ -1026,6 +1004,14 @@ long hmm_range_fault(struct hmm_range *range, unsigned int flags) > /* Keep trying while the range is valid. */ > } while (ret == -EBUSY && range->valid); > > + /* > + * We do put_dev_pagemap() here so that we can leverage > + * get_dev_pagemap() optimization which will not re-take a > + * reference on a pgmap if we already have one. > + */ > + if (hmm_vma_walk->pgmap) > + put_dev_pagemap(hmm_vma_walk->pgmap); > + Seems ok, but only if the caller is guaranteeing that the range does not span outside of a single pagemap instance. If that guarantee is met why not just have the caller pass in a pinned pagemap? If that guarantee is not met, then I think we're back to your race concern.