On Tue, Apr 30, 2024 at 08:57:48PM +0200, Daniel Vetter wrote: > On Tue, Apr 30, 2024 at 02:30:02PM -0300, Jason Gunthorpe wrote: > > On Mon, Apr 29, 2024 at 10:25:48AM +0200, Thomas Hellström wrote: > > > > > > Yes there is another common scheme where you bind a window of CPU to > > > > a > > > > window on the device and mirror a fixed range, but this is a quite > > > > different thing. It is not SVA, it has a fixed range, and it is > > > > probably bound to a single GPU VMA in a multi-VMA device page table. > > > > > > And this above here is exactly what we're implementing, and the GPU > > > page-tables are populated using device faults. Regions (large) of the > > > mirrored CPU mm need to coexist in the same GPU vm as traditional GPU > > > buffer objects. > > > > Well, not really, if that was the case you'd have a single VMA over > > the entire bound range, not dynamically create them. > > > > A single VMA that uses hmm_range_fault() to populate the VM is > > completely logical. > > > > Having a hidden range of mm binding and then creating/destroying 2M > > VMAs dynamicaly is the thing that doesn't make alot of sense. > > I only noticed this thread now but fyi I did dig around in the > implementation and it's summarily an absolute no-go imo for multiple > reasons. It starts with this approach of trying to mirror cpu vma (which I > think originated from amdkfd) leading to all kinds of locking fun, and > then it gets substantially worse when you dig into the details. :( Why does the DRM side struggle so much with hmm_range fault? I would have thought it should have a fairly straightforward and logical connection to the GPU page table. FWIW, it does make sense to have both a window and a full MM option for hmm_range_fault. ODP does both and it is fine.. > I think until something more solid shows up you can just ignore this. I do > fully agree that for sva the main mirroring primitive needs to be page > centric, so dma_map_sg. ^^^^^^^^^^ dma_map_page > There's a bit a question around how to make the > necessary batching efficient and the locking/mmu_interval_notifier scale > enough, but I had some long chats with Thomas and I think there's enough > option to spawn pretty much any possible upstream consensus. So I'm not > worried. Sure, the new DMA API will bring some more considerations to this as well. ODP uses a 512M granual scheme and it seems to be OK. By far the worst part of all this is the faulting performance. I've yet hear any complains about mmu notifier performance.. > But first this needs to be page-centric in the fundamental mirroring > approach. Yes Jason