> -----Original Message----- > From: Jason Gunthorpe <jgg@xxxxxxxx> > Sent: Monday, October 05, 2020 9:33 AM > To: Xiong, Jianxin <jianxin.xiong@xxxxxxxxx> > Cc: linux-rdma@xxxxxxxxxxxxxxx; dri-devel@xxxxxxxxxxxxxxxxxxxxx; Doug Ledford <dledford@xxxxxxxxxx>; Leon Romanovsky > <leon@xxxxxxxxxx>; Sumit Semwal <sumit.semwal@xxxxxxxxxx>; Christian Koenig <christian.koenig@xxxxxxx>; Vetter, Daniel > <daniel.vetter@xxxxxxxxx> > Subject: Re: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region > > On Mon, Oct 05, 2020 at 04:18:11PM +0000, Xiong, Jianxin wrote: > > > > The implementation in mlx5 will be much more understandable, it > > > would just do dma_buf_dynamic_attach() and program the XLT exactly the same as a normal umem. > > > > > > The move_notify() simply zap's the XLT and triggers a work to reload > > > it after the move. Locking is provided by the dma_resv_lock. Only a small disruption to the page fault handler is needed. > > > > We considered such scheme but didn't go that way due to the lack of > > notification when the move is done and thus the work wouldn't know > > when it can reload. > > Well, the work would block on the reservation lock and that indicates the move is done Got it. Will work on that. > > It would be nicer if the dma_buf could provide an op that things are ready to go though > > > Now I think it again, we could probably signal the reload in the page fault handler. > > This also works, with a performance cost > > > > > + dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL); > > > > + sgt = dma_buf_map_attachment(umem_dmabuf->attach, > > > > + DMA_BIDIRECTIONAL); > > > > + dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv); > > > > > > This doesn't look right, this lock has to be held up until the HW is > > > programmed > > > > The mapping remains valid until being invalidated again. There is a > > sequence number checking before programming the HW. > > It races, we could immediately trigger invalidation and then re-program the HW with this stale data. > > > > The use of atomic looks probably wrong as well. > > > > Do you mean umem_dmabuf->notifier_seq? Could you elaborate the concern? > > It only increments once per invalidation, that usually is racy. I will rework these parts. > > > > > + total_pages = ib_umem_odp_num_pages(umem_odp); > > > > + for_each_sg(umem->sg_head.sgl, sg, umem->sg_head.nents, j) { > > > > + addr = sg_dma_address(sg); > > > > + pages = sg_dma_len(sg) >> page_shift; > > > > + while (pages > 0 && k < total_pages) { > > > > + umem_odp->dma_list[k++] = addr | access_mask; > > > > + umem_odp->npages++; > > > > + addr += page_size; > > > > + pages--; > > > > > > This isn't fragmenting the sg into a page list properly, won't work > > > for unaligned things > > > > I thought the addresses are aligned, but will add explicit alignment here. > > I have no idea what comes out of dma_buf, I wouldn't make too many assumptions since it does have to pass through the IOMMU layer too > > > > And really we don't need the dma_list for this case, with a fixed > > > whole mapping DMA SGL a normal umem sgl is OK and the normal umem > > > XLT programming in mlx5 is fine. > > > > The dma_list is used by both "polulate_mtt()" and > > "mlx5_ib_invalidate_range", which are used for XLT programming and > > invalidating (zapping), respectively. > > Don't use those functions for the dma_buf flow. Ok. I think we can use mlx5_ib_update_xlt() directly for dma-buf case. > > Jason