On Wed, Mar 21, 2018 at 05:11:10PM -0700, John Hubbard wrote: > On 03/21/2018 04:37 PM, Jerome Glisse wrote: > > On Wed, Mar 21, 2018 at 04:10:32PM -0700, John Hubbard wrote: > >> On 03/21/2018 03:46 PM, Jerome Glisse wrote: > >>> On Wed, Mar 21, 2018 at 03:16:04PM -0700, John Hubbard wrote: > >>>> On 03/21/2018 11:03 AM, Jerome Glisse wrote: > >>>>> On Tue, Mar 20, 2018 at 09:14:34PM -0700, John Hubbard wrote: > >>>>>> On 03/19/2018 07:00 PM, jglisse@xxxxxxxxxx wrote: > >>>>>>> From: Ralph Campbell <rcampbell@xxxxxxxxxx> > > > > [...] > > > >>>>> That is just illegal, the release callback is not allowed to trigger > >>>>> invalidation all it does is kill all device's threads and stop device > >>>>> page fault from happening. So there is no deadlock issues. I can re- > >>>>> inforce the comment some more (see [1] for example on what it should > >>>>> be). > >>>> > >>>> That rule is fine, and it is true that the .release callback will not > >>>> directly trigger any invalidations. However, the problem is in letting > >>>> any *existing* outstanding operations finish up. We have to let > >>>> existing operations "drain", in order to meet the requirement that > >>>> everything is done when .release returns. > >>>> > >>>> For example, if a device driver thread is in the middle of working through > >>>> its fault buffer, it will call migrate_vma(), which will in turn unmap > >>>> pages. That will cause an hmm_invalidate_range() callback, which tries > >>>> to take hmm->mirrors_sems, and we deadlock. > >>>> > >>>> There's no way to "kill" such a thread while it's in the middle of > >>>> migrate_vma(), you have to let it finish up. > >>>> > >>>>> Also it is illegal for the sync callback to trigger any mmu_notifier > >>>>> callback. I thought this was obvious. The sync callback should only > >>>>> update device page table and do _nothing else_. No way to make this > >>>>> re-entrant. > >>>> > >>>> That is obvious, yes. I am not trying to say there is any problem with > >>>> that rule. It's the "drain outstanding operations during .release", > >>>> above, that is the real problem. > >>> > >>> Maybe just relax the release callback wording, it should stop any > >>> more processing of fault buffer but not wait for it to finish. In > >>> nouveau code i kill thing but i do not wait hence i don't deadlock. > >> > >> But you may crash, because that approach allows .release to finish > >> up, thus removing the mm entirely, out from under (for example) > >> a migrate_vma call--or any other call that refers to the mm. > > > > No you can not crash on mm as it will not vanish before you are done > > with it as mm will not be freed before you call hmm_unregister() and > > you should not call that from release, nor should you call it before > > everything is flush. However vma struct might vanish ... i might have > > assume wrongly about the down_write() always happening in exit_mmap() > > This might be a solution to force serialization. > > > > OK. My details on mm destruction were inaccurate, but we do agree now > that that the whole virtual address space is being torn down at the same > time as we're trying to use it, so I think we're on the same page now. > > >> > >> It doesn't seem too hard to avoid the problem, though: maybe we > >> can just drop the lock while doing the mirror->ops->release callback. > >> There are a few ways to do this, but one example is: > >> > >> -- take the lock, > >> -- copy the list to a local list, deleting entries as you go, > >> -- drop the lock, > >> -- iterate through the local list copy and > >> -- issue the mirror->ops->release callbacks. > >> > >> At this point, more items could have been added to the list, so repeat > >> the above until the original list is empty. > >> > >> This is subject to a limited starvation case if mirror keep getting > >> registered, but I think we can ignore that, because it only lasts as long as > >> mirrors keep getting added, and then it finishes up. > > > > The down_write is better solution and easier just 2 line of code. > > OK. I'll have a better idea when I see it. > > > > >> > >>> > >>> What matter is to stop any further processing. Yes some fault might > >>> be in flight but they will serialize on various lock. > >> > >> Those faults in flight could already be at a point where they have taken > >> whatever locks they need, so we don't dare let the mm get destroyed while > >> such fault handling is in progress. > > > > mm can not vanish until hmm_unregister() is call, vma will vanish before. > > OK, yes. And we agree that vma vanishing is a problem. > > > > >> So just do not > >>> wait in the release callback, kill thing. I might have a bug where i > >>> still fill in GPU page table in nouveau, i will check nouveau code > >>> for that. > >> > >> Again, we can't "kill" a thread of execution (this would often be an > >> interrupt bottom half context, btw) while it is, for example, > >> in the middle of migrate_vma. > > > > You should not call migrate from bottom half ! Only call this from work > > queue like nouveau. > > By "bottom half", I mean the kthread that we have running to handle work > that was handed off from the top half ISR. So we are in process context. > And we will need to do migrate_vma() from there. > > > > >> > >> I really don't believe there is a safe way to do this without draining > >> the existing operations before .release returns, and for that, we'll need to > >> issue the .release callbacks while not holding locks. > > > > down_write on mmap_sem would force serialization. I am not sure we want > > to do this change now. It can wait as it is definitly not an issue for > > nouveau yet. Taking mmap_sem in write (see oom in exit_mmap()) in release > > make me nervous. > > > > I'm not going to lose any sleep about when various fixes are made, as long as > we agree on problems and solution approaches, and fix them at some point. > I will note that our downstreamdriver will not be...well, completely usable, > until we fix this, though. > So i posted updated patch for 3 and 4 that should address your concern. Testing done with them and nouveau seems to work ok. I am hopping this address all your concerns. Cheers, Jérôme