[Fixed Alex's email address, sorry for getting it wrong first] On 2019-05-13 3:49 p.m., Jerome Glisse wrote: > [CAUTION: External Email] > > Andrew can we get this 2 fixes line up for 5.2 ? > > On Mon, May 13, 2019 at 07:36:44PM +0000, Kuehling, Felix wrote: >> Hi Jerome, >> >> Do you want me to push the patches to your branch? Or are you going to >> apply them yourself? >> >> Is your hmm-5.2-v3 branch going to make it into Linux 5.2? If so, do you >> know when? I'd like to coordinate with Dave Airlie so that we can also >> get that update into a drm-next branch soon. >> >> I see that Linus merged Dave's pull request for Linux 5.2, which >> includes the first changes in amdgpu using HMM. They're currently broken >> without these two patches. > HMM patch do not go through any git branch they go through the mmotm > collection. So it is not something you can easily coordinate with drm > branch. > > By broken i expect you mean that if numabalance happens it breaks ? > Or it might sleep when you are not expecting it too ? Without the NUMA fix we'd end up using an outdated physical address in the GPU page table. The problem was caught by a test that got incorrect computation results using OpenCL on a NUMA system. Without the FAULT_FLAG_ALLOW_RETRY patch, there can be kernel oopses due to incorrect locking/unlocking of mmap_sem. It breaks the promise that hmm_range_fault should not unlock the mmap_sem if block==true. It takes some memory pressure to trigger this. Regards, Felix > > Cheers, > Jérôme > >> Thanks, >> Felix >> >> On 2019-05-10 4:14 p.m., Jerome Glisse wrote: >>> [CAUTION: External Email] >>> >>> On Fri, May 10, 2019 at 07:53:24PM +0000, Kuehling, Felix wrote: >>>> Don't set this flag by default in hmm_vma_do_fault. It is set >>>> conditionally just a few lines below. Setting it unconditionally >>>> can lead to handle_mm_fault doing a non-blocking fault, returning >>>> -EBUSY and unlocking mmap_sem unexpectedly. >>>> >>>> Signed-off-by: Felix Kuehling <Felix.Kuehling@xxxxxxx> >>> Reviewed-by: Jérôme Glisse <jglisse@xxxxxxxxxx> >>> >>>> --- >>>> mm/hmm.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/mm/hmm.c b/mm/hmm.c >>>> index b65c27d5c119..3c4f1d62202f 100644 >>>> --- a/mm/hmm.c >>>> +++ b/mm/hmm.c >>>> @@ -339,7 +339,7 @@ struct hmm_vma_walk { >>>> static int hmm_vma_do_fault(struct mm_walk *walk, unsigned long addr, >>>> bool write_fault, uint64_t *pfn) >>>> { >>>> - unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_REMOTE; >>>> + unsigned int flags = FAULT_FLAG_REMOTE; >>>> struct hmm_vma_walk *hmm_vma_walk = walk->private; >>>> struct hmm_range *range = hmm_vma_walk->range; >>>> struct vm_area_struct *vma = walk->vma; >>>> -- >>>> 2.17.1 >>>>