On 3/28/19 2:30 PM, Jerome Glisse wrote: > On Thu, Mar 28, 2019 at 01:54:01PM -0700, John Hubbard wrote: >> On 3/25/19 7:40 AM, jglisse@xxxxxxxxxx wrote: >>> From: Jérôme Glisse <jglisse@xxxxxxxxxx> >>> >>> The device driver context which holds reference to mirror and thus to >>> core hmm struct might outlive the mm against which it was created. To >>> avoid every driver to check for that case provide an helper that check >>> if mm is still alive and take the mmap_sem in read mode if so. If the >>> mm have been destroy (mmu_notifier release call back did happen) then >>> we return -EINVAL so that calling code knows that it is trying to do >>> something against a mm that is no longer valid. >>> >>> Changes since v1: >>> - removed bunch of useless check (if API is use with bogus argument >>> better to fail loudly so user fix their code) >>> >>> Signed-off-by: Jérôme Glisse <jglisse@xxxxxxxxxx> >>> Reviewed-by: Ralph Campbell <rcampbell@xxxxxxxxxx> >>> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> >>> Cc: John Hubbard <jhubbard@xxxxxxxxxx> >>> Cc: Dan Williams <dan.j.williams@xxxxxxxxx> >>> --- >>> include/linux/hmm.h | 50 ++++++++++++++++++++++++++++++++++++++++++--- >>> 1 file changed, 47 insertions(+), 3 deletions(-) >>> >>> diff --git a/include/linux/hmm.h b/include/linux/hmm.h >>> index f3b919b04eda..5f9deaeb9d77 100644 >>> --- a/include/linux/hmm.h >>> +++ b/include/linux/hmm.h >>> @@ -438,6 +438,50 @@ struct hmm_mirror { >>> int hmm_mirror_register(struct hmm_mirror *mirror, struct mm_struct *mm); >>> void hmm_mirror_unregister(struct hmm_mirror *mirror); >>> >>> +/* >>> + * hmm_mirror_mm_down_read() - lock the mmap_sem in read mode >>> + * @mirror: the HMM mm mirror for which we want to lock the mmap_sem >>> + * Returns: -EINVAL if the mm is dead, 0 otherwise (lock taken). >>> + * >>> + * The device driver context which holds reference to mirror and thus to core >>> + * hmm struct might outlive the mm against which it was created. To avoid every >>> + * driver to check for that case provide an helper that check if mm is still >>> + * alive and take the mmap_sem in read mode if so. If the mm have been destroy >>> + * (mmu_notifier release call back did happen) then we return -EINVAL so that >>> + * calling code knows that it is trying to do something against a mm that is >>> + * no longer valid. >>> + */ >>> +static inline int hmm_mirror_mm_down_read(struct hmm_mirror *mirror) >> >> Hi Jerome, >> >> Let's please not do this. There are at least two problems here: >> >> 1. The hmm_mirror_mm_down_read() wrapper around down_read() requires a >> return value. This is counter to how locking is normally done: callers do >> not normally have to check the return value of most locks (other than >> trylocks). And sure enough, your own code below doesn't check the return value. >> That is a pretty good illustration of why not to do this. > > Please read the function description this is not about checking lock > return value it is about checking wether we are racing with process > destruction and avoid trying to take lock in such cases so that driver > do abort as quickly as possible when a process is being kill. > >> >> 2. This is a weird place to randomly check for semi-unrelated state, such >> as "is HMM still alive". By that I mean, if you have to detect a problem >> at down_read() time, then the problem could have existed both before and >> after the call to this wrapper. So it is providing a false sense of security, >> and it is therefore actually undesirable to add the code. > > It is not, this function is use in device page fault handler which will > happens asynchronously from CPU event or process lifetime when a process > is killed or is dying we do want to avoid useless page fault work and > we do want to avoid blocking the page fault queue of the device. This > function reports to the caller that the process is dying and that it > should just abort the page fault and do whatever other device specific > thing that needs to happen. > But it's inherently racy, to check for a condition outside of any lock, so again, it's a false sense of security. >> >> If you insist on having this wrapper, I think it should have approximately >> this form: >> >> void hmm_mirror_mm_down_read(...) >> { >> WARN_ON(...) >> down_read(...) >> } > > I do insist as it is useful and use by both RDMA and nouveau and the > above would kill the intent. The intent is do not try to take the lock > if the process is dying. Could you provide me a link to those examples so I can take a peek? I am still convinced that this whole thing is a race condition at best. > > >> >>> +{ >>> + struct mm_struct *mm; >>> + >>> + /* Sanity check ... */ >>> + if (!mirror || !mirror->hmm) >>> + return -EINVAL; >>> + /* >>> + * Before trying to take the mmap_sem make sure the mm is still >>> + * alive as device driver context might outlive the mm lifetime. >> >> Let's find another way, and a better place, to solve this problem. >> Ref counting? > > This has nothing to do with refcount or use after free or anthing > like that. It is just about checking wether we are about to do > something pointless. If the process is dying then it is pointless > to try to take the lock and it is pointless for the device driver > to trigger handle_mm_fault(). Well, what happens if you let such pointless code run anyway? Does everything still work? If yes, then we don't need this change. If no, then we need a race-free version of this change. thanks, -- John Hubbard NVIDIA