Hi, Matt. On Fri, 2024-08-30 at 13:47 +0000, Matthew Brost wrote: > On Fri, Aug 30, 2024 at 11:57:33AM +0200, Thomas Hellström wrote: > > Hi, Matthew, > > > > Agreed the below might not be important just now, but some ideas: > > > > On Thu, 2024-08-29 at 20:56 +0000, Matthew Brost wrote: > > > Issues with removing a SVM range: > > > > > > - Xe bind code stores invalidation / present state in VMA, this > > > would > > > need to be moved to the radix tree. I have Jira open for that > > > work > > > which I believe other developers are going to own. > > > > Yeah, although we shouldn't *design* around xe bind-code and page- > > table > > code shortcomings. > > > > I'm thinking this one certainly should be fixed sooner rather than > later which would be helpful. > > But let's also consider the case where we get a bunch of individual > page > invalidates serially for an entire range (I can't remember when this > happens but I have seen it in my testing, will look into this more to > figure exactly when). If we invalidate 1 page at a time in radix > tree, > each invalidation could potentially results in TLB invalidation > interaction with the hardware in cases where a larger GPU pages are > not > being used. The TLB invalidation is going to vastly slower than any > CPU > operation (e.g. RB search, radix tree walk). If we key on a range > invalidate the entire once on the first invalidation this may end up > being significantly faster. > > Above is pure speculation though, a lot of what both of us is saying > is... So another reason I'd like to get apps running to do profiling. > It > would be nice to make design decisions based on data not speculation. Well nothing would stop you from adding a configurable invalidation granularity, even with a radix-tree based approach. You'd just pad the invalidation range to match the granularity. > > > > > > - Where would the dma mapping / device pages be stored? > > > - In the radix tree? What if ATS is enabled? We don't > > > have a > > > driver owned radix tree. How do we reasonably connect > > > a > > > driver > > > owned radix to a common GPUSVM layer? > > > > With ATS you mean IOMMU SVA, right? I think we could assume that > > any > > user of this code also has a gpu page-table since otherwise they > > couldn't be using VRAM and a simpler solution would be in place. > > > > Fair point. > > > But to that specific question, drm_gpusvm state would live in a > > drm_gpusvm radix tree and driver-specific stuff in the driver tree. > > A > > helper based approach would then call drm_gpusvm_unmap_dma(range), > > whereas a middle layer would just traverse the tree and unmap. > > > > Let me consider this. Open to all options. > > > > - In the notifier? What is the notifier is sparsely > > > populated? > > > We would be wasting huge amounts of memory. What is > > > the > > > notifier is configured to span the entire virtual > > > address > > > space? > > > > Let's assume you use a fake page-table like in xe_pt_walk.c as your > > "radix tree", adapted to relevant page-sizes, sparsity is not a > > problem. > > > > Ok, makes sense I think. > > > > - How does the garbage collector work? We can't allocate memory > > > in > > > the > > > notifier so we don't anything to add to the garbage collector. > > > We > > > can't directly modify page tables given you need lock in the > > > path > > > of > > > reclaim. > > > > The garbage collector would operate on the whole invalidated range. > > In > > the case of xe, upon zapping under reclaim you mark individual > > page- > > table bos that are to be removed as "invalid", the garbage > > collector > > walks the range removing the "invalid" entries. Subsequent (re- > > binding) > > avoids the "invalid" entries, (perhaps even helps removing them) > > and > > can thus race with the garbage collector. Hence, any ranges implied > > by > > the page-table code are elimitated. > > > > This is pretty much with what I came up with too if we didn't have a > SVM > range. > > > > - How do we deal with fault storms (e.g. tons of faults hitting > > > the > > > same > > > SVM range in a row)? Without a SVM range no every to know if > > > mapping > > > is valid and GPU page handler can be short circuited. > > > > Perhaps look at page-table tree and check whether the gpu_pte > > causing > > the fault is valid. > > > > Came up with the same thing. > > > > - Do we have notifier seqno for every PTE? > > > > I'd say no. With this approach it makes sense to have a wide > > notifier. > > The seqno now only affects binding of new gpu_ptes, so the problem > > with > > a wide notifier becomes that if invalidation occurs to *any* part > > of > > the notifier while we're in the read section during binding, we > > need to > > I have avoided this by the drm_gpusvm_range_pages_valid. This isn't > just > an optimization is actually required for the 2 tile case to be able > to > safely know when dma pages can be unmapped (i.e. you can't dma unmap > pages if either tile has a valid mapping). OK, I still need to read up on that.. Thanks, Thomas > > Matt > > > rerun the binding. Adding more notifiers to mitigate that would be > > to > > optimize faulting performance over core invalidation performance > > which > > Jason asked us to avoid. > > > > /Thomas > > > > > >