Re: [PATCH 06/23] drm/xe/svm: Introduce a helper to build sg table from hmm range

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, May 03, 2024 at 02:43:19PM +0000, Zeng, Oak wrote:
> > > 2.
> > > Then call hmm_range_fault a second time
> > > Setting the hmm_range start/end only to cover valid pfns
> > > With all valid pfns, set the REQ_FAULT flag
> > 
> > Why would you do this? The first already did the faults you needed and
> > returned all the easy pfns that don't require faulting.
> 
> But we have use case where we want to fault-in pages other than the
> page which contains the GPU fault address, e.g., user malloc'ed or
> mmap'ed 8MiB buffer, and no CPU touching of this buffer before GPU
> access it. Let's say GPU access caused a GPU page fault a 2MiB
> place. The first hmm-range-fault would only fault-in the page at
> 2MiB place, because in the first call we only set REQ_FAULT to the
> pfn at 2MiB place.

Honestly, that doesn't make alot of sense to me, but if you really
want that you should add some new flag and have hmm_range_fault do
this kind of speculative faulting. I think you will end up
significantly over faulting.

It also doesn't make sense to do faulting in hmm prefetch if you are
going to do migration to force the fault anyhow.


> > > Basically use hmm_range_fault to figure out the valid address range
> > > in the first round; then really fault (e.g., trigger cpu fault to
> > > allocate system pages) in the second call the hmm range fault.
> > 
> > You don't fault on prefetch. Prefetch is about mirroring already
> > populated pages, it should not be causing new faults.
> 
> Maybe there is different wording here. We have this scenario we call
> it prefetch, or whatever you call it:
>
> GPU page fault at an address A, we want to map an address range
> (e.g., 2MiB, or whatever size depending on setting) around address A
> to GPU page table. The range around A could have no backing pages
> when GPU page fault happens. We want to populate the 2MiB range. We
> can call it prefetch because most of pages in this range is not
> accessed by GPU yet, but we expect GPU to access it soon.

This isn't prefetch, that is prefaulting.
 
> You mentioned "already populated pages". Who populated those pages
> then? Is it a CPU access populated them? If CPU access those pages
> first, it is true pages can be already populated. 

Yes, I would think that is a pretty common case

> But it is also a valid use case where GPU access address before CPU
> so there is no "already populated pages" on GPU page fault. Please
> let us know what is the picture in your head. We seem picture it
> completely differently.

And sure, this could happen too, but I feel like it is an application
issue to be not prefaulting the buffers it knows the GPU is going to
touch.

Again, our experiments have shown that taking the fault path is so
slow that sane applications must explicitly prefault and prefetch as
much as possible to avoid the faults in the first place.

I'm not sure I full agree there is a real need to agressively optimize
the faulting path like you are describing when it shouldn't really be
used in a performant application :\

> 2) decide a migration window per migration granularity (e.g., 2MiB)
> settings, inside the CPU VMA. If CPU VMA is smaller than the
> migration granular, migration window is the whole CPU vma range;
> otherwise, partially of the VMA range is migrated.

Seems rather arbitary to me. You are quite likely to capture some
memory that is CPU memory and cause thrashing. As I said before in
common cases the heap will be large single VMAs, so this kind of
scheme is just going to fault a whole bunch of unrelated malloc
objects over to the GPU.

Not sure how it is really a good idea.

Adaptive locality of memory is still an unsolved problem in Linux,
sadly.

> > However, the migration stuff should really not be in the driver
> > either. That should be core DRM logic to manage that. It is so
> > convoluted and full of policy that all the drivers should be working
> > in the same way.
> 
> Completely agreed. Moving migration infrastructures to DRM is part
> of our plan. We want to first prove of concept with xekmd driver,
> then move helpers, infrastructures to DRM. Driver should be as easy
> as implementation a few callback functions for device specific page
> table programming and device migration, and calling some DRM common
> functions during gpu page fault.

You'd be better to start out this way so people can look at and
understand the core code on its own merits.

Jason



[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux