On 3/28/19 3:08 PM, Jerome Glisse wrote: > On Thu, Mar 28, 2019 at 02:41:02PM -0700, John Hubbard wrote: >> On 3/28/19 2:30 PM, Jerome Glisse wrote: >>> On Thu, Mar 28, 2019 at 01:54:01PM -0700, John Hubbard wrote: >>>> On 3/25/19 7:40 AM, jglisse@xxxxxxxxxx wrote: >>>>> From: Jérôme Glisse <jglisse@xxxxxxxxxx> [...] >> >>>> >>>> If you insist on having this wrapper, I think it should have approximately >>>> this form: >>>> >>>> void hmm_mirror_mm_down_read(...) >>>> { >>>> WARN_ON(...) >>>> down_read(...) >>>> } >>> >>> I do insist as it is useful and use by both RDMA and nouveau and the >>> above would kill the intent. The intent is do not try to take the lock >>> if the process is dying. >> >> Could you provide me a link to those examples so I can take a peek? I >> am still convinced that this whole thing is a race condition at best. > > The race is fine and ok see: > > https://cgit.freedesktop.org/~glisse/linux/commit/?h=hmm-odp-v2&id=eebd4f3095290a16ebc03182e2d3ab5dfa7b05ec > > which has been posted and i think i provided a link in the cover > letter to that post. The same patch exist for nouveau i need to > cleanup that tree and push it. Thanks for that link, and I apologize for not keeping up with that other review thread. Looking it over, hmm_mirror_mm_down_read() is only used in one place. So, what you really want there is not a down_read() wrapper, but rather, something like hmm_sanity_check() , that ib_umem_odp_map_dma_pages() calls. > >>> >>> >>>> >>>>> +{ >>>>> + struct mm_struct *mm; >>>>> + >>>>> + /* Sanity check ... */ >>>>> + if (!mirror || !mirror->hmm) >>>>> + return -EINVAL; >>>>> + /* >>>>> + * Before trying to take the mmap_sem make sure the mm is still >>>>> + * alive as device driver context might outlive the mm lifetime. >>>> >>>> Let's find another way, and a better place, to solve this problem. >>>> Ref counting? >>> >>> This has nothing to do with refcount or use after free or anthing >>> like that. It is just about checking wether we are about to do >>> something pointless. If the process is dying then it is pointless >>> to try to take the lock and it is pointless for the device driver >>> to trigger handle_mm_fault(). >> >> Well, what happens if you let such pointless code run anyway? >> Does everything still work? If yes, then we don't need this change. >> If no, then we need a race-free version of this change. > > Yes everything work, nothing bad can happen from a race, it will just > do useless work which never hurt anyone. > OK, so let's either drop this patch, or if merge windows won't allow that, then *eventually* drop this patch. And instead, put in a hmm_sanity_check() that does the same checks. thanks, -- John Hubbard NVIDIA