Re: [PATCH 06/23] drm/xe/svm: Introduce a helper to build sg table from hmm range

Jason Gunthorpe <jgg@xxxxxxxxxx> · Tue, 30 Apr 2024 21:09:15 -0300

On Tue, Apr 30, 2024 at 08:57:48PM +0200, Daniel Vetter wrote:
> On Tue, Apr 30, 2024 at 02:30:02PM -0300, Jason Gunthorpe wrote:
> > On Mon, Apr 29, 2024 at 10:25:48AM +0200, Thomas Hellström wrote:
> > 
> > > > Yes there is another common scheme where you bind a window of CPU to
> > > > a
> > > > window on the device and mirror a fixed range, but this is a quite
> > > > different thing. It is not SVA, it has a fixed range, and it is
> > > > probably bound to a single GPU VMA in a multi-VMA device page table.
> > > 
> > > And this above here is exactly what we're implementing, and the GPU
> > > page-tables are populated using device faults. Regions (large) of the
> > > mirrored CPU mm need to coexist in the same GPU vm as traditional GPU
> > > buffer objects.
> > 
> > Well, not really, if that was the case you'd have a single VMA over
> > the entire bound range, not dynamically create them.
> > 
> > A single VMA that uses hmm_range_fault() to populate the VM is
> > completely logical.
> > 
> > Having a hidden range of mm binding and then creating/destroying 2M
> > VMAs dynamicaly is the thing that doesn't make alot of sense.
> 
> I only noticed this thread now but fyi I did dig around in the
> implementation and it's summarily an absolute no-go imo for multiple
> reasons. It starts with this approach of trying to mirror cpu vma (which I
> think originated from amdkfd) leading to all kinds of locking fun, and
> then it gets substantially worse when you dig into the details.

:(

Why does the DRM side struggle so much with hmm_range fault? I would
have thought it should have a fairly straightforward and logical
connection to the GPU page table.

FWIW, it does make sense to have both a window and a full MM option
for hmm_range_fault. ODP does both and it is fine..

> I think until something more solid shows up you can just ignore this. I do
> fully agree that for sva the main mirroring primitive needs to be page
> centric, so dma_map_sg. 
              ^^^^^^^^^^

dma_map_page

> There's a bit a question around how to make the
> necessary batching efficient and the locking/mmu_interval_notifier scale
> enough, but I had some long chats with Thomas and I think there's enough
> option to spawn pretty much any possible upstream consensus. So I'm not
> worried.

Sure, the new DMA API will bring some more considerations to this as
well. ODP uses a 512M granual scheme and it seems to be OK. By far the
worst part of all this is the faulting performance. I've yet hear any
complains about mmu notifier performance..

> But first this needs to be page-centric in the fundamental mirroring
> approach.

Yes

Jason