Re: [PATCH 03/15] mm/hmm: HMM should have a callback before MM is destroyed v2

John Hubbard <jhubbard@xxxxxxxxxx> · Wed, 21 Mar 2018 15:16:04 -0700

On 03/21/2018 11:03 AM, Jerome Glisse wrote:
> On Tue, Mar 20, 2018 at 09:14:34PM -0700, John Hubbard wrote:
>> On 03/19/2018 07:00 PM, jglisse@xxxxxxxxxx wrote:
>>> From: Ralph Campbell <rcampbell@xxxxxxxxxx>

<snip>

>> Hi Jerome,
>>
>> This presents a deadlock problem (details below). As for solution ideas, 
>> Mark Hairgrove points out that the MMU notifiers had to solve the
>> same sort of problem, and part of the solution involves "avoid
>> holding locks when issuing these callbacks". That's not an entire 
>> solution description, of course, but it seems like a good start.
>>
>> Anyway, for the deadlock problem:
>>
>> Each of these ->release callbacks potentially has to wait for the 
>> hmm_invalidate_range() callbacks to finish. That is not shown in any
>> code directly, but it's because: when a device driver is processing 
>> the above ->release callback, it has to allow any in-progress operations 
>> to finish up (as specified clearly in your comment documentation above). 
>>
>> Some of those operations will invariably need to do things that result 
>> in page invalidations, thus triggering the hmm_invalidate_range() callback.
>> Then, the hmm_invalidate_range() callback tries to acquire the same 
>> hmm->mirrors_sem lock, thus leading to deadlock:
>>
>> hmm_invalidate_range():
>> // ...
>> 	down_read(&hmm->mirrors_sem);
>> 	list_for_each_entry(mirror, &hmm->mirrors, list)
>> 		mirror->ops->sync_cpu_device_pagetables(mirror, action,
>> 							start, end);
>> 	up_read(&hmm->mirrors_sem);
> 
> That is just illegal, the release callback is not allowed to trigger
> invalidation all it does is kill all device's threads and stop device
> page fault from happening. So there is no deadlock issues. I can re-
> inforce the comment some more (see [1] for example on what it should
> be).

That rule is fine, and it is true that the .release callback will not 
directly trigger any invalidations. However, the problem is in letting 
any *existing* outstanding operations finish up. We have to let 
existing operations "drain", in order to meet the requirement that 
everything is done when .release returns.

For example, if a device driver thread is in the middle of working through
its fault buffer, it will call migrate_vma(), which will in turn unmap
pages. That will cause an hmm_invalidate_range() callback, which tries
to take hmm->mirrors_sems, and we deadlock.

There's no way to "kill" such a thread while it's in the middle of
migrate_vma(), you have to let it finish up.

> 
> Also it is illegal for the sync callback to trigger any mmu_notifier
> callback. I thought this was obvious. The sync callback should only
> update device page table and do _nothing else_. No way to make this
> re-entrant.

That is obvious, yes. I am not trying to say there is any problem with
that rule. It's the "drain outstanding operations during .release", 
above, that is the real problem.

thanks,
-- 
John Hubbard
NVIDIA

> 
> For anonymous private memory migrated to device memory it is freed
> shortly after the release callback (see exit_mmap()). For share memory
> you might want to migrate back to regular memory but that will be fine
> as you will not get mmu_notifier callback any more.
> 
> So i don't see any deadlock here.
> 
> Cheers,
> Jérôme
> 
> [1] https://cgit.freedesktop.org/~glisse/linux/commit/?h=nouveau-hmm&id=93adb3e6b4f39d5d146b6a8afb4175d37bdd4890
>