On Mon, May 06, 2024 at 03:04:15PM +0200, Daniel Vetter wrote: > On Sat, May 04, 2024 at 11:03:03AM +1000, Dave Airlie wrote: > > > Let me know if this understanding is correct. > > > > > > Or what would you like to do in such situation? > > > > > > > > > > > Not sure how it is really a good idea. > > > > > > > > Adaptive locality of memory is still an unsolved problem in Linux, > > > > sadly. > > > > > > > > > > However, the migration stuff should really not be in the driver > > > > > > either. That should be core DRM logic to manage that. It is so > > > > > > convoluted and full of policy that all the drivers should be working > > > > > > in the same way. > > > > > > > > > > Completely agreed. Moving migration infrastructures to DRM is part > > > > > of our plan. We want to first prove of concept with xekmd driver, > > > > > then move helpers, infrastructures to DRM. Driver should be as easy > > > > > as implementation a few callback functions for device specific page > > > > > table programming and device migration, and calling some DRM common > > > > > functions during gpu page fault. > > > > > > > > You'd be better to start out this way so people can look at and > > > > understand the core code on its own merits. > > > > > > The two steps way were agreed with DRM maintainers, see here: https://lore.kernel.org/dri-devel/SA1PR11MB6991045CC69EC8E1C576A715925F2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/, bullet 4) > > > > After this discussion and the other cross-device HMM stuff I think we > > should probably push more for common up-front, I think doing this in a > > driver without considering the bigger picture might not end up > > extractable, and then I fear the developers will just move onto other > > things due to management pressure to land features over correctness. > > > > I think we have enough people on the list that can review this stuff, > > and even if the common code ends up being a little xe specific, > > iterating it will be easier outside the driver, as we can clearly > > demark what is inside and outside. > > tldr; Yeah concurring. > > I think like with the gpu vma stuff we should at least aim for the core > data structures, and more importantly, the locking design and how it > interacts with core mm services to be common code. > I believe this is a reasonable request and hopefully, it should end up being a pretty thin layer. drm_gpusvm? Have some ideas. Let's see what we come up with. Matt > I read through amdkfd and I think that one is warning enough that this > area is one of these cases where going with common code aggressively is > much better. Because it will be buggy in terribly "how do we get out of > this design corner again ever?" ways no matter what. But with common code > there will at least be all of dri-devel and hopefully some mm folks > involved in sorting things out. > > Most other areas it's indeed better to explore the design space with a few > drivers before going with common code, at the cost of having some really > terrible driver code in upstream. But here the cost of some really bad > design in drivers is just too expensive imo. > -Sima > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch