On Thu, Nov 12, 2020 at 05:08:22PM -0500, Dennis Dalessandro wrote: > On 11/12/2020 5:06 PM, Dennis Dalessandro wrote: > > On 11/12/2020 12:14 PM, Ira Weiny wrote: > > > On Wed, Nov 11, 2020 at 09:58:37PM -0500, Dennis Dalessandro wrote: > > > > Two earlier bug fixes have created a security problem in the hfi1 > > > > driver. One fix aimed to solve an issue where current->mm was not valid > > > > when closing the hfi1 cdev. It attempted to do this by saving a cached > > > > value of the current->mm pointer at file open time. This is a problem if > > > > another process with access to the FD calls in via write() or ioctl() to > > > > pin pages via the hfi driver. The other fix tried to solve a use after > > > > free by taking a reference on the mm. This was just wrong because its > > > > possible for a race condition between one process with an mm that opened > > > > the cdev if it was accessing via an IOCTL, and another process > > > > attempting to close the cdev with a different current->mm. > > > > > > Again I'm still not seeing the race here. It is entirely possible > > > that the fix > > > I was trying to do way back was mistaken too... ;-) I would just > > > delete the > > > last 2 sentences... and/or reference the commit of those fixes and help > > > explain this more. > > > > I was attempting to refer to [1], the email that started all of this. > > That link should be: > [1] https://marc.info/?l=linux-rdma&m=159891753806720&w=2 Ah... ok That does not have anything to do with a close. He is worried about the mm structure going away because the other process exited. That can't happen, even with the old code, because the release will not be called until the child process calls close. But even if the mm is still around the get_user_pages_fast() in the child is _going_ to use current->mm if it falls back to the locked version. Thus it is going to go off in the weeds when trying to pin user addresses in the child. Basically there is no 'race', the code is just broken and going to do the wrong thing regardless of timing! :-( The new code is keeping the mm_grab() reference in the mmu_notifier rather than in the fd structure, an improvement for sure, but for many applications likely to have almost the same lifetime as before, in the parent process. Also Jann is 100% correct that the driver should not be operating on the wrong mm and you are using his methodology #3.[1] So I think the final point is the key to fixing the bug. Keeping any current->mm which is not the one we opened the file with... (or more specifically the one which first registered memory). In some ways this may be worse than before because technically the parent could open the fd and hand it to the child and have the child register with it's mm. But that is ok really... May just be odd behavior for some users depending on what operations they do and in what order. Ira [1] Also, you probably should credit Jann for the idea with a suggested by tag. > > -Denny >