On Fri, Apr 05, 2024 at 04:42:14PM +0000, Zeng, Oak wrote: > > > Above codes deal with a case where dma map is not needed. As I > > > understand it, whether we need a dma map depends on the devices > > > topology. For example, when device access host memory or another > > > device's memory through pcie, we need dma mapping; if the connection > > > b/t devices is xelink (similar to nvidia's nvlink), all device's > > > memory can be in same address space, so no dma mapping is needed. > > > > Then you call dma_map_page to do your DMA side and you avoid it for > > the DEVICE_PRIVATE side. SG list doesn't help this anyhow. > > When dma map is needed, we used dma_map_sgtable, a different flavor > of the dma-map-page function. I saw, I am saying this should not be done. You cannot unmap bits of a sgl mapping if an invalidation comes in. > The reason we also used (mis-used) sg list for non-dma-map cases, is > because we want to re-use some data structure. Our goal here is, for > a hmm_range, build a list of device physical address (can be > dma-mapped or not), which will be used later on to program the > device page table. We re-used the sg list structure for the > non-dma-map cases so those two cases can share the same page table > programming codes. Since sg list was originally designed for > dma-map, it does look like this is mis-used here. Please don't use sg list at all for this. > Need to mention, even for some DEVICE_PRIVATE memory, we also need > dma-mapping. For example, if you have two devices connected to each > other through PCIe, both devices memory are registered as > DEVICE_PRIVATE to hmm. Yes, but you don't ever dma map DEVICE_PRIVATE. > I believe we need a dma-map when one device access another device's > memory. Two devices' memory doesn't belong to same address space in > this case. For modern GPU with xeLink/nvLink/XGMI, this is not > needed. Review my emails here: https://lore.kernel.org/dri-devel/20240403125712.GA1744080@xxxxxxxxxx/ Which explain how it should work. > > A 1:1 SVA mapping is a special case of this where there is a single > > GPU VMA that spans the entire process address space with a 1:1 VA (no > > offset). > > From implementation perspective, we can have one device page table > for one process for such 1:1 va mapping, but it is not necessary to > have a single gpu vma. We can have many gpu vma each cover a segment > of this address space. This is not what I'm talking about. The GPU VMA is bound to a specific MM VA, it should not be created on demand. If you want the full 1:1 SVA case to optimize invalidations you don't need something like a VMA, a simple bitmap reducing the address space into 1024 faulted in chunks or something would be much cheaper than some dynamic VMA ranges. Jason