Re: [PATCH 06/23] drm/xe/svm: Introduce a helper to build sg table from hmm range

Matthew Brost <matthew.brost@xxxxxxxxx> · Mon, 6 May 2024 23:50:36 +0000

On Mon, May 06, 2024 at 03:04:15PM +0200, Daniel Vetter wrote:
> On Sat, May 04, 2024 at 11:03:03AM +1000, Dave Airlie wrote:
> > > Let me know if this understanding is correct.
> > >
> > > Or what would you like to do in such situation?
> > >
> > > >
> > > > Not sure how it is really a good idea.
> > > >
> > > > Adaptive locality of memory is still an unsolved problem in Linux,
> > > > sadly.
> > > >
> > > > > > However, the migration stuff should really not be in the driver
> > > > > > either. That should be core DRM logic to manage that. It is so
> > > > > > convoluted and full of policy that all the drivers should be working
> > > > > > in the same way.
> > > > >
> > > > > Completely agreed. Moving migration infrastructures to DRM is part
> > > > > of our plan. We want to first prove of concept with xekmd driver,
> > > > > then move helpers, infrastructures to DRM. Driver should be as easy
> > > > > as implementation a few callback functions for device specific page
> > > > > table programming and device migration, and calling some DRM common
> > > > > functions during gpu page fault.
> > > >
> > > > You'd be better to start out this way so people can look at and
> > > > understand the core code on its own merits.
> > >
> > > The two steps way were agreed with DRM maintainers, see here:  https://lore.kernel.org/dri-devel/SA1PR11MB6991045CC69EC8E1C576A715925F2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/, bullet 4)
> > 
> > After this discussion and the other cross-device HMM stuff I think we
> > should probably push more for common up-front, I think doing this in a
> > driver without considering the bigger picture might not end up
> > extractable, and then I fear the developers will just move onto other
> > things due to management pressure to land features over correctness.
> > 
> > I think we have enough people on the list that can review this stuff,
> > and even if the common code ends up being a little xe specific,
> > iterating it will be easier outside the driver, as we can clearly
> > demark what is inside and outside.
> 
> tldr; Yeah concurring.
> 
> I think like with the gpu vma stuff we should at least aim for the core
> data structures, and more importantly, the locking design and how it
> interacts with core mm services to be common code.
> 

I believe this is a reasonable request and hopefully, it should end up
being a pretty thin layer. drm_gpusvm? Have some ideas. Let's see what
we come up with.

Matt

> I read through amdkfd and I think that one is warning enough that this
> area is one of these cases where going with common code aggressively is
> much better. Because it will be buggy in terribly "how do we get out of
> this design corner again ever?" ways no matter what. But with common code
> there will at least be all of dri-devel and hopefully some mm folks
> involved in sorting things out.
> 
> Most other areas it's indeed better to explore the design space with a few
> drivers before going with common code, at the cost of having some really
> terrible driver code in upstream. But here the cost of some really bad
> design in drivers is just too expensive imo.
> -Sima
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch