Re: [PATCH rdma-next v2 1/4] IB/core: Improve ODP to use hmm_range_fault()

Jason Gunthorpe <jgg@xxxxxxxxxx> · Tue, 29 Sep 2020 17:13:03 -0300

On Tue, Sep 29, 2020 at 11:09:43PM +0300, Yishai Hadas wrote:
> On 9/29/2020 10:27 PM, Jason Gunthorpe wrote:
> > On Tue, Sep 22, 2020 at 11:21:01AM +0300, Leon Romanovsky wrote:
> > 
> > > +	if (!*dma_addr) {
> > > +		*dma_addr = ib_dma_map_page(dev, page, 0,
> > > +				1 << umem_odp->page_shift,
> > > +				DMA_BIDIRECTIONAL);
> > > +		if (ib_dma_mapping_error(dev, *dma_addr)) {
> > > +			*dma_addr = 0;
> > > +			return -EFAULT;
> > > +		}
> > > +		umem_odp->npages++;
> > > +	}
> > > +
> > > +	*dma_addr |= access_mask;
> > This does need some masking, the purpose of this is to update the
> > access flags in the case we hit a fault on a dma mapped thing. Looks
> > like this can happen on a read-only page becoming writable again
> > (wp_page_reuse() doesn't trigger notifiers)
> > 
> > It should also have a comment to that effect.
> > 
> > something like:
> > 
> > if (*dma_addr) {
> >      /*
> >       * If the page is already dma mapped it means it went through a
> >       * non-invalidating trasition, like read-only to writable. Resync the
> >       * flags.
> >       */
> >      *dma_addr = (*dma_addr & (~ODP_DMA_ADDR_MASK)) | access_mask;
> Did you mean
> 
> *dma_addr = (*dma_addr & (ODP_DMA_ADDR_MASK)) | access_mask;

Probably

> flags. (see ODP_DMA_ADDR_MASK).  Also, if we went through a
> read->write access without invalidation why do we need to mask at
> all ? the new access_mask should have the write access.

Feels like a good idea to be safe here

> > > +		WARN_ON(range.hmm_pfns[pfn_index] & HMM_PFN_ERROR);
> > > +		WARN_ON(!(range.hmm_pfns[pfn_index] & HMM_PFN_VALID));
> > > +		hmm_order = hmm_pfn_to_map_order(range.hmm_pfns[pfn_index]);
> > > +		/* If a hugepage was detected and ODP wasn't set for, the umem
> > > +		 * page_shift will be used, the opposite case is an error.
> > > +		 */
> > > +		if (hmm_order + PAGE_SHIFT < page_shift) {
> > > +			ret = -EINVAL;
> > > +			pr_debug("%s: un-expected hmm_order %d, page_shift %d\n",
> > > +				 __func__, hmm_order, page_shift);
> > >   			break;
> > >   		}
> > I think this break should be a continue here. There is no reason not
> > to go to the next aligned PFN and try to sync as much as possible.
> 
> This might happen if the application didn't honor the contract to use
> hugepages for the full range despite that it sets IB_ACCESS_HUGETLB, right ?

Yes

> Do we still need to sync as much as possible in that case ? I
> believe that we may consider return an error in this case to let
> application be aware of as was before this series.

We might be prefetching or something weird where it could make sense.

> > This should also
> > 
> >    WARN_ON(umem_odp->dma_list[dma_index]);
> > 
> > And all the pr_debugs around this code being touched should become
> > mlx5_ib_dbg
> We are in IB core, why mlx5_ib_debug ?

oops, dev_dbg

Jason