RE: [PATCH 06/23] drm/xe/svm: Introduce a helper to build sg table from hmm range

"Zeng, Oak" <oak.zeng@xxxxxxxxx> · Thu, 2 May 2024 19:25:50 +0000

Hi Jason,

I tried to understand how you supposed us to use hmm range fault... it seems you want us to call hmm range fault two times on each gpu page fault:

1.
Call Hmm_range_fault first time, pfn of the fault address is set with HMM_PFN_REQ_FAULT
Other pfns in the PREFAULT_SIZE range will be set as 0
Hmm_range fault returns:
	Pfn with 0 flag or HMM_PFN_VALID flag, means a valid pfn
	Pfn with HMM_PFN_ERROR flag, means invalid pfn

2.	
Then call hmm_range_fault a second time
Setting the hmm_range start/end only to cover valid pfns
With all valid pfns, set the REQ_FAULT flag

Basically use hmm_range_fault to figure out the valid address range in the first round; then really fault (e.g., trigger cpu fault to allocate system pages) in the second call the hmm range fault.

Do I understand it correctly?

This is strange to me. We should already know the valid address range before we call hmm range fault, because the migration codes need to look up cpu vma anyway. what is the point of the first hmm_range fault?

Oak

> -----Original Message-----
> From: Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx>
> Sent: Thursday, May 2, 2024 11:02 AM
> To: Jason Gunthorpe <jgg@xxxxxxxxxx>
> Cc: Daniel Vetter <daniel@xxxxxxxx>; Zeng, Oak <oak.zeng@xxxxxxxxx>; dri-
> devel@xxxxxxxxxxxxxxxxxxxxx; intel-xe@xxxxxxxxxxxxxxxxxxxxx; Brost,
> Matthew <matthew.brost@xxxxxxxxx>; Welty, Brian
> <brian.welty@xxxxxxxxx>; Ghimiray, Himal Prasad
> <himal.prasad.ghimiray@xxxxxxxxx>; Bommu, Krishnaiah
> <krishnaiah.bommu@xxxxxxxxx>; Vishwanathapura, Niranjana
> <niranjana.vishwanathapura@xxxxxxxxx>; Leon Romanovsky
> <leon@xxxxxxxxxx>
> Subject: Re: [PATCH 06/23] drm/xe/svm: Introduce a helper to build sg table
> from hmm range
> 
> On Thu, 2024-05-02 at 09:46 -0300, Jason Gunthorpe wrote:
> > On Thu, May 02, 2024 at 11:11:04AM +0200, Thomas Hellström wrote:
> >
> > > It's true the cpu vma lookup is a remnant from amdkfd. The idea
> > > here is
> > > to replace that with fixed prefaulting ranges of tunable size. So
> > > far,
> > > as you mention, the prefaulting range has been determined by the
> > > CPU
> > > vma size. Given previous feedback, this is going to change.
> >
> > Perhaps limiting prefault to a VMA barrier is a reasonable thing to
> > do, but the implementation should be pushed into hmm_range_fault and
> > not open coded in the driver.
> >
> > > Still the prefaulting range needs to be restricted to avoid -EFAULT
> > > failures in hmm_range_fault(). That can ofc be done by calling it
> > > without HMM_PFN_REQ_FAULT for the range and interpret the returned
> > > pnfs.
> >
> > Yes, this is exactly what that feature is for, you mark your prefetch
> > differently from the fault critical page(s).
> >
> > > There is a performance concern of this approach as compared to
> > > peeking at the CPU vmas directly, since hmm_range_fault() would
> > > need to
> > > be called twice. Any guidelines ideas here?
> >
> > If there is something wrong with hmm_range_fault() then please fix
> > it. I'm not sure why you'd call it twice, the HMM_PFN_REQ_FAULT is
> > per
> > PFN?
> 
> Ah, yes you're right. I somehow thought it was per range. Makes sense
> now.
> 
> Thanks,
> Thomas
> 
> 
> 
> >
> > Jason