On Fri, Jun 07, 2019 at 02:37:07PM -0700, Ralph Campbell wrote: > > On 6/6/19 11:44 AM, Jason Gunthorpe wrote: > > From: Jason Gunthorpe <jgg@xxxxxxxxxxxx> > > > > hmm_release() is called exactly once per hmm. ops->release() cannot > > accidentally trigger any action that would recurse back onto > > hmm->mirrors_sem. > > > > This fixes a use after-free race of the form: > > > > CPU0 CPU1 > > hmm_release() > > up_write(&hmm->mirrors_sem); > > hmm_mirror_unregister(mirror) > > down_write(&hmm->mirrors_sem); > > up_write(&hmm->mirrors_sem); > > kfree(mirror) > > mirror->ops->release(mirror) > > > > The only user we have today for ops->release is an empty function, so this > > is unambiguously safe. > > > > As a consequence of plugging this race drivers are not allowed to > > register/unregister mirrors from within a release op. > > > > Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxxxx> > > I agree with the analysis above but I'm not sure that release() will > always be an empty function. It might be more efficient to write back > all data migrated to a device "in one pass" instead of relying > on unmap_vmas() calling hmm_start_range_invalidate() per VMA. I think we have to focus on the *current* kernel - and we have two users of release, nouveau_svm.c is empty and amdgpu_mn.c does schedule_work() - so I believe we should go ahead with this simple solution to the actual race today that both of those will suffer from. If we find a need for a more complex version then it can be debated and justified with proper context... Ok? Jason