Re: [PATCH 06/23] drm/xe/svm: Introduce a helper to build sg table from hmm range

Jason Gunthorpe <jgg@xxxxxxxxxx> · Fri, 5 Apr 2024 15:02:12 -0300

On Fri, Apr 05, 2024 at 04:42:14PM +0000, Zeng, Oak wrote:
> > > Above codes deal with a case where dma map is not needed. As I
> > > understand it, whether we need a dma map depends on the devices
> > > topology. For example, when device access host memory or another
> > > device's memory through pcie, we need dma mapping; if the connection
> > > b/t devices is xelink (similar to nvidia's nvlink), all device's
> > > memory can be in same address space, so no dma mapping is needed.
> > 
> > Then you call dma_map_page to do your DMA side and you avoid it for
> > the DEVICE_PRIVATE side. SG list doesn't help this anyhow.
> 
> When dma map is needed, we used dma_map_sgtable, a different flavor
> of the dma-map-page function.

I saw, I am saying this should not be done. You cannot unmap bits of
a sgl mapping if an invalidation comes in.

> The reason we also used (mis-used) sg list for non-dma-map cases, is
> because we want to re-use some data structure. Our goal here is, for
> a hmm_range, build a list of device physical address (can be
> dma-mapped or not), which will be used later on to program the
> device page table. We re-used the sg list structure for the
> non-dma-map cases so those two cases can share the same page table
> programming codes. Since sg list was originally designed for
> dma-map, it does look like this is mis-used here.

Please don't use sg list at all for this.

> Need to mention, even for some DEVICE_PRIVATE memory, we also need
> dma-mapping. For example, if you have two devices connected to each
> other through PCIe, both devices memory are registered as
> DEVICE_PRIVATE to hmm. 

Yes, but you don't ever dma map DEVICE_PRIVATE.

> I believe we need a dma-map when one device access another device's
> memory. Two devices' memory doesn't belong to same address space in
> this case. For modern GPU with xeLink/nvLink/XGMI, this is not
> needed.

Review my emails here:

https://lore.kernel.org/dri-devel/20240403125712.GA1744080@xxxxxxxxxx/

Which explain how it should work.

> > A 1:1 SVA mapping is a special case of this where there is a single
> > GPU VMA that spans the entire process address space with a 1:1 VA (no
> > offset).
> 
> From implementation perspective, we can have one device page table
> for one process for such 1:1 va mapping, but it is not necessary to
> have a single gpu vma. We can have many gpu vma each cover a segment
> of this address space. 

This is not what I'm talking about. The GPU VMA is bound to a specific
MM VA, it should not be created on demand.

If you want the full 1:1 SVA case to optimize invalidations you don't
need something like a VMA, a simple bitmap reducing the address space
into 1024 faulted in chunks or something would be much cheaper than
some dynamic VMA ranges.

Jason