On Wed, Apr 24, 2024 at 04:35:17PM +0000, Matthew Brost wrote: > On Wed, Apr 24, 2024 at 10:57:54AM -0300, Jason Gunthorpe wrote: > > On Wed, Apr 24, 2024 at 02:31:36AM +0000, Matthew Brost wrote: > > > > > AMD seems to register notifiers on demand for parts of the address space > > > [1], I think Nvidia's open source driver does this too (can look this up > > > if needed). We (Intel) also do this in Xe and the i915 for userptrs > > > (explictly binding a user address via IOCTL) too and it seems to work > > > quite well. > > > > I always thought AMD's implementation of this stuff was bad.. > > No comment on the quality of AMD's implementaion. > > But in general the view among my team members that registering notifiers > on demand for sub ranges is an accepted practice. Yes, but not on a 2M granual, and not without sparsity. Do it on something like an aligned 512M and it would be fairly reasonable. > You do not *need* some other data structure as you could always just > walk the page tables but in practice a data structure exists in a tree > of shorts with the key being a VA range. The data structure has meta > data about the mapping, all GPU drivers seem to have this. What "meta data" is there for a SVA mapping? The entire page table is an SVA. > structure, along with pages returned from hmm_range_fault, are used to > program the GPU PTEs. Most likely pages returned from hmm_range_fault() can just be stored directly in the page table's PTEs. I'd be surprised if you actually need seperate storage. (ignoring some of the current issues with the DMA API) > Again the allocation of this data structure happens *before* calling > hmm_range_fault on first GPU fault within unmapped range. The SVA page table and hmm_range_fault are tightly connected together, if a vma is needed to make it work then it is not "before", it is part of. Jason