On Mon, Oct 05, 2020 at 04:18:11PM +0000, Xiong, Jianxin wrote: > > The implementation in mlx5 will be much more understandable, it would just do dma_buf_dynamic_attach() and program the XLT exactly > > the same as a normal umem. > > > > The move_notify() simply zap's the XLT and triggers a work to reload it after the move. Locking is provided by the dma_resv_lock. Only a > > small disruption to the page fault handler is needed. > > We considered such scheme but didn't go that way due to the lack of > notification when the move is done and thus the work wouldn't know > when it can reload. Well, the work would block on the reservation lock and that indicates the move is done It would be nicer if the dma_buf could provide an op that things are ready to go though > Now I think it again, we could probably signal the reload in the page fault handler. This also works, with a performance cost > > > + dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL); > > > + sgt = dma_buf_map_attachment(umem_dmabuf->attach, > > > + DMA_BIDIRECTIONAL); > > > + dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv); > > > > This doesn't look right, this lock has to be held up until the HW is programmed > > The mapping remains valid until being invalidated again. There is a > sequence number checking before programming the HW. It races, we could immediately trigger invalidation and then re-program the HW with this stale data. > > The use of atomic looks probably wrong as well. > > Do you mean umem_dmabuf->notifier_seq? Could you elaborate the concern? It only increments once per invalidation, that usually is racy. > > > + total_pages = ib_umem_odp_num_pages(umem_odp); > > > + for_each_sg(umem->sg_head.sgl, sg, umem->sg_head.nents, j) { > > > + addr = sg_dma_address(sg); > > > + pages = sg_dma_len(sg) >> page_shift; > > > + while (pages > 0 && k < total_pages) { > > > + umem_odp->dma_list[k++] = addr | access_mask; > > > + umem_odp->npages++; > > > + addr += page_size; > > > + pages--; > > > > This isn't fragmenting the sg into a page list properly, won't work for unaligned things > > I thought the addresses are aligned, but will add explicit alignment here. I have no idea what comes out of dma_buf, I wouldn't make too many assumptions since it does have to pass through the IOMMU layer too > > And really we don't need the dma_list for this case, with a fixed > > whole mapping DMA SGL a normal umem sgl is OK and the normal umem > > XLT programming in mlx5 is fine. > > The dma_list is used by both "polulate_mtt()" and > "mlx5_ib_invalidate_range", which are used for XLT programming and > invalidating (zapping), respectively. Don't use those functions for the dma_buf flow. Jason