Re: [PATCH rdma-next v2 1/4] IB/core: Improve ODP to use hmm_range_fault()

Yishai Hadas <yishaih@xxxxxxxxxx> · Tue, 29 Sep 2020 23:09:43 +0300

On 9/29/2020 10:27 PM, Jason Gunthorpe wrote:
On Tue, Sep 22, 2020 at 11:21:01AM +0300, Leon Romanovsky wrote:

+	if (!*dma_addr) {
+		*dma_addr = ib_dma_map_page(dev, page, 0,
+				1 << umem_odp->page_shift,
+				DMA_BIDIRECTIONAL);
+		if (ib_dma_mapping_error(dev, *dma_addr)) {
+			*dma_addr = 0;
+			return -EFAULT;
+		}
+		umem_odp->npages++;
+	}
+
+	*dma_addr |= access_mask;
This does need some masking, the purpose of this is to update the
access flags in the case we hit a fault on a dma mapped thing. Looks
like this can happen on a read-only page becoming writable again
(wp_page_reuse() doesn't trigger notifiers)

It should also have a comment to that effect.

something like:

if (*dma_addr) {
     /*
      * If the page is already dma mapped it means it went through a
      * non-invalidating trasition, like read-only to writable. Resync the
      * flags.
      */
     *dma_addr = (*dma_addr & (~ODP_DMA_ADDR_MASK)) | access_mask;
Did you mean

*dma_addr = (*dma_addr & (ODP_DMA_ADDR_MASK)) | access_mask;

Otherwise we may lose the dma_addr itself and just have the access flags. (see ODP_DMA_ADDR_MASK).
Also, if we went through a read->write access without invalidation why do we need to mask at all ? the new access_mask should have the write access.

     return;
}

new_dma_addr = ib_dma_map_page()
[..]
*dma_addr = new_dma_addr | access_mask

+		WARN_ON(range.hmm_pfns[pfn_index] & HMM_PFN_ERROR);
+		WARN_ON(!(range.hmm_pfns[pfn_index] & HMM_PFN_VALID));
+		hmm_order = hmm_pfn_to_map_order(range.hmm_pfns[pfn_index]);
+		/* If a hugepage was detected and ODP wasn't set for, the umem
+		 * page_shift will be used, the opposite case is an error.
+		 */
+		if (hmm_order + PAGE_SHIFT < page_shift) {
+			ret = -EINVAL;
+			pr_debug("%s: un-expected hmm_order %d, page_shift %d\n",
+				 __func__, hmm_order, page_shift);
  			break;
  		}
I think this break should be a continue here. There is no reason not
to go to the next aligned PFN and try to sync as much as possible.

This might happen if the application didn't honor the contract to use 
hugepages for the full range despite that it sets IB_ACCESS_HUGETLB, right ?

Do we still need to sync as much as possible in that case ? I believe 
that we may consider return an error in this case to let application be 
aware of as was before this series.

This should also

   WARN_ON(umem_odp->dma_list[dma_index]);

And all the pr_debugs around this code being touched should become
mlx5_ib_dbg
We are in IB core, why mlx5_ib_debug ?

Jason