On 03/21/2018 03:46 PM, Jerome Glisse wrote: > On Wed, Mar 21, 2018 at 03:16:04PM -0700, John Hubbard wrote: >> On 03/21/2018 11:03 AM, Jerome Glisse wrote: >>> On Tue, Mar 20, 2018 at 09:14:34PM -0700, John Hubbard wrote: >>>> On 03/19/2018 07:00 PM, jglisse@xxxxxxxxxx wrote: >>>>> From: Ralph Campbell <rcampbell@xxxxxxxxxx> >> >> <snip> >> >>>> Hi Jerome, >>>> >>>> This presents a deadlock problem (details below). As for solution ideas, >>>> Mark Hairgrove points out that the MMU notifiers had to solve the >>>> same sort of problem, and part of the solution involves "avoid >>>> holding locks when issuing these callbacks". That's not an entire >>>> solution description, of course, but it seems like a good start. >>>> >>>> Anyway, for the deadlock problem: >>>> >>>> Each of these ->release callbacks potentially has to wait for the >>>> hmm_invalidate_range() callbacks to finish. That is not shown in any >>>> code directly, but it's because: when a device driver is processing >>>> the above ->release callback, it has to allow any in-progress operations >>>> to finish up (as specified clearly in your comment documentation above). >>>> >>>> Some of those operations will invariably need to do things that result >>>> in page invalidations, thus triggering the hmm_invalidate_range() callback. >>>> Then, the hmm_invalidate_range() callback tries to acquire the same >>>> hmm->mirrors_sem lock, thus leading to deadlock: >>>> >>>> hmm_invalidate_range(): >>>> // ... >>>> down_read(&hmm->mirrors_sem); >>>> list_for_each_entry(mirror, &hmm->mirrors, list) >>>> mirror->ops->sync_cpu_device_pagetables(mirror, action, >>>> start, end); >>>> up_read(&hmm->mirrors_sem); >>> >>> That is just illegal, the release callback is not allowed to trigger >>> invalidation all it does is kill all device's threads and stop device >>> page fault from happening. So there is no deadlock issues. I can re- >>> inforce the comment some more (see [1] for example on what it should >>> be). >> >> That rule is fine, and it is true that the .release callback will not >> directly trigger any invalidations. However, the problem is in letting >> any *existing* outstanding operations finish up. We have to let >> existing operations "drain", in order to meet the requirement that >> everything is done when .release returns. >> >> For example, if a device driver thread is in the middle of working through >> its fault buffer, it will call migrate_vma(), which will in turn unmap >> pages. That will cause an hmm_invalidate_range() callback, which tries >> to take hmm->mirrors_sems, and we deadlock. >> >> There's no way to "kill" such a thread while it's in the middle of >> migrate_vma(), you have to let it finish up. >> >>> Also it is illegal for the sync callback to trigger any mmu_notifier >>> callback. I thought this was obvious. The sync callback should only >>> update device page table and do _nothing else_. No way to make this >>> re-entrant. >> >> That is obvious, yes. I am not trying to say there is any problem with >> that rule. It's the "drain outstanding operations during .release", >> above, that is the real problem. > > Maybe just relax the release callback wording, it should stop any > more processing of fault buffer but not wait for it to finish. In > nouveau code i kill thing but i do not wait hence i don't deadlock. But you may crash, because that approach allows .release to finish up, thus removing the mm entirely, out from under (for example) a migrate_vma call--or any other call that refers to the mm. It doesn't seem too hard to avoid the problem, though: maybe we can just drop the lock while doing the mirror->ops->release callback. There are a few ways to do this, but one example is: -- take the lock, -- copy the list to a local list, deleting entries as you go, -- drop the lock, -- iterate through the local list copy and -- issue the mirror->ops->release callbacks. At this point, more items could have been added to the list, so repeat the above until the original list is empty. This is subject to a limited starvation case if mirror keep getting registered, but I think we can ignore that, because it only lasts as long as mirrors keep getting added, and then it finishes up. > > What matter is to stop any further processing. Yes some fault might > be in flight but they will serialize on various lock. Those faults in flight could already be at a point where they have taken whatever locks they need, so we don't dare let the mm get destroyed while such fault handling is in progress. So just do not > wait in the release callback, kill thing. I might have a bug where i > still fill in GPU page table in nouveau, i will check nouveau code > for that. Again, we can't "kill" a thread of execution (this would often be an interrupt bottom half context, btw) while it is, for example, in the middle of migrate_vma. I really don't believe there is a safe way to do this without draining the existing operations before .release returns, and for that, we'll need to issue the .release callbacks while not holding locks. thanks, -- John Hubbard NVIDIA > > Kill thing should also kill the channel (i don't do that in nouveau > because i am waiting on some channel patchset) but i am not sure if > hardware like it if we kill channel before stoping fault notification. > > Cheers, > Jérôme >