On Sat, May 04, 2024 at 11:03:03AM +1000, Dave Airlie wrote: > > Let me know if this understanding is correct. > > > > Or what would you like to do in such situation? > > > > > > > > Not sure how it is really a good idea. > > > > > > Adaptive locality of memory is still an unsolved problem in Linux, > > > sadly. > > > > > > > > However, the migration stuff should really not be in the driver > > > > > either. That should be core DRM logic to manage that. It is so > > > > > convoluted and full of policy that all the drivers should be working > > > > > in the same way. > > > > > > > > Completely agreed. Moving migration infrastructures to DRM is part > > > > of our plan. We want to first prove of concept with xekmd driver, > > > > then move helpers, infrastructures to DRM. Driver should be as easy > > > > as implementation a few callback functions for device specific page > > > > table programming and device migration, and calling some DRM common > > > > functions during gpu page fault. > > > > > > You'd be better to start out this way so people can look at and > > > understand the core code on its own merits. > > > > The two steps way were agreed with DRM maintainers, see here: https://lore.kernel.org/dri-devel/SA1PR11MB6991045CC69EC8E1C576A715925F2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/, bullet 4) > > After this discussion and the other cross-device HMM stuff I think we > should probably push more for common up-front, I think doing this in a > driver without considering the bigger picture might not end up > extractable, and then I fear the developers will just move onto other > things due to management pressure to land features over correctness. > > I think we have enough people on the list that can review this stuff, > and even if the common code ends up being a little xe specific, > iterating it will be easier outside the driver, as we can clearly > demark what is inside and outside. tldr; Yeah concurring. I think like with the gpu vma stuff we should at least aim for the core data structures, and more importantly, the locking design and how it interacts with core mm services to be common code. I read through amdkfd and I think that one is warning enough that this area is one of these cases where going with common code aggressively is much better. Because it will be buggy in terribly "how do we get out of this design corner again ever?" ways no matter what. But with common code there will at least be all of dri-devel and hopefully some mm folks involved in sorting things out. Most other areas it's indeed better to explore the design space with a few drivers before going with common code, at the cost of having some really terrible driver code in upstream. But here the cost of some really bad design in drivers is just too expensive imo. -Sima -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch