> -----Original Message----- > From: Jason Gunthorpe <jgg@xxxxxxxx> > Sent: Friday, October 16, 2020 11:58 AM > To: Xiong, Jianxin <jianxin.xiong@xxxxxxxxx> > Cc: linux-rdma@xxxxxxxxxxxxxxx; dri-devel@xxxxxxxxxxxxxxxxxxxxx; Doug Ledford <dledford@xxxxxxxxxx>; Leon Romanovsky > <leon@xxxxxxxxxx>; Sumit Semwal <sumit.semwal@xxxxxxxxxx>; Christian Koenig <christian.koenig@xxxxxxx>; Vetter, Daniel > <daniel.vetter@xxxxxxxxx> > Subject: Re: [PATCH v5 4/5] RDMA/mlx5: Support dma-buf based userspace memory region > > On Fri, Oct 16, 2020 at 06:40:01AM +0000, Xiong, Jianxin wrote: > > > > + if (!mr) > > > > + return -EINVAL; > > > > + > > > > + return mlx5_ib_update_xlt(mr, 0, mr->npages, PAGE_SHIFT, flags); > > > > +} > > > > + > > > > +static struct ib_umem_dmabuf_ops mlx5_ib_umem_dmabuf_ops = { > > > > + .init = mlx5_ib_umem_dmabuf_xlt_init, > > > > + .update = mlx5_ib_umem_dmabuf_xlt_update, > > > > + .invalidate = mlx5_ib_umem_dmabuf_xlt_invalidate, > > > > +}; > > > > > > I'm not really convinced these should be ops, this is usually a bad design pattern. > > > > > > Why do I need so much code to extract the sgl from the dma_buf? I > > > would prefer the dma_buf layer simplify this, not by adding a wrapper around it in the IB core code... > > > > > > > We just need a way to call a device specific function to update the > > NIC's translation table. I considered three ways: (1) ops registered > > with ib_umem_get_dmabuf; > > (2) a single function pointer registered with ib_umem_get_dmabuf; (3) > > a method in 'struct ib_device'. Option (1) was chosen here with no > > strong reason. We could consolidate the three functions of the ops into one, but then we will need to > > define commands or flags for different update operations. > > I'd rather the driver directly provide the dma_buf ops.. Inserting layers that do nothing be call other layers is usually a bad idea. I didn't look > carefully yet at how that would be arranged. I can work along that direction. One change I can see is that the umem_dmabuf structure will need to be exposed to the device driver (currently it's private to the core). > > > > > + ncont = npages; > > > > + order = ilog2(roundup_pow_of_two(ncont)); > > > > > > We still need to deal with contiguity here, this ncont/npages is just obfuscation. > > > > Since the pages can move, we can't take advantage of contiguity here. > > This handling is similar to the ODP case. The variables 'ncont' and 'page_shift' here are not necessary. > > They are kept just for the sake of signifying the semantics of the > > following functions that use them. > > Well, in this case we can manage it, and the performance boost is high enough we need to. The work on mlx5 to do it is a bit inovlved > though. Maybe as a future enhancement? > > > > > + err = ib_umem_dmabuf_init_mapping(umem, mr); > > > > + if (err) { > > > > + dereg_mr(dev, mr); > > > > + return ERR_PTR(err); > > > > + } > > > > > > Did you test the page fault path at all? Looks like some xarray code > > > is missing here, and this is also missing the related complex teardown logic. > > > > > > Does this mean you didn't test the pagefault_dmabuf_mr() at all? > > > > Thanks for the hint. I was unable to get the test runs reaching the > > pagefault_dmabuf_mr() function. Now I have found the reason. Along the > > path of all the page fault handlers, the array "odp_mkeys" is checked > > against the mr key. Since the dmabuf mr keys are not in the list the > > handler is never called. > > > > On the other hand, it seems that pagefault_dmabuf_mr() is not needed at all. > > The pagefault is gracefully handled by retrying until the work thread > > finished programming the NIC. > > This is a bug of some kind, pagefaults that can't find a mkey in the xarray should cause completion with error. > > Jason