On Wed, Apr 24, 2024 at 01:44:11PM -0300, Jason Gunthorpe wrote: > On Wed, Apr 24, 2024 at 04:35:17PM +0000, Matthew Brost wrote: > > On Wed, Apr 24, 2024 at 10:57:54AM -0300, Jason Gunthorpe wrote: > > > On Wed, Apr 24, 2024 at 02:31:36AM +0000, Matthew Brost wrote: > > > > > > > AMD seems to register notifiers on demand for parts of the address space > > > > [1], I think Nvidia's open source driver does this too (can look this up > > > > if needed). We (Intel) also do this in Xe and the i915 for userptrs > > > > (explictly binding a user address via IOCTL) too and it seems to work > > > > quite well. > > > > > > I always thought AMD's implementation of this stuff was bad.. > > > > No comment on the quality of AMD's implementaion. > > > > But in general the view among my team members that registering notifiers > > on demand for sub ranges is an accepted practice. > > Yes, but not on a 2M granual, and not without sparsity. Do it on > something like an aligned 512M and it would be fairly reasonable. > > > You do not *need* some other data structure as you could always just > > walk the page tables but in practice a data structure exists in a tree > > of shorts with the key being a VA range. The data structure has meta > > data about the mapping, all GPU drivers seem to have this. > > What "meta data" is there for a SVA mapping? The entire page table is > an SVA. > If we have allocated memory for GPU page tables in the range, if range has been invalidated, notifier seqno, what type of mapping this is (SVA, BO, userptr, NULL)... The "meta data" covers all types of mappings, not just SVA. SVA is a specific class of the "meta data". > > structure, along with pages returned from hmm_range_fault, are used to > > program the GPU PTEs. > > Most likely pages returned from hmm_range_fault() can just be stored > directly in the page table's PTEs. I'd be surprised if you actually > need seperate storage. (ignoring some of the current issues with the > DMA API) > In theory that could work but again practice this not how it is done. The "meta data" cover all the classes mapping mentioned above. Our PTE programming code needs to be handle all the different requirements of these specific classes in a single code path. > > Again the allocation of this data structure happens *before* calling > > hmm_range_fault on first GPU fault within unmapped range. > > The SVA page table and hmm_range_fault are tightly connected together, > if a vma is needed to make it work then it is not "before", it is > part of. > It is companion data for the GPU page table walk. See above explaination. Matt > Jason