On 8/31/2020 8:21 PM, Jason Gunthorpe wrote:
On Tue, Sep 01, 2020 at 01:45:06AM +0200, Jann Horn wrote:
struct hfi1_filedata has a member ->mm that holds a ->mm_count reference:
static int hfi1_file_open(struct inode *inode, struct file *fp)
{
struct hfi1_filedata *fd;
[...]
fd->mm = current->mm;
mmgrab(fd->mm); // increments ->mm_count
[...]
}
Yikes, gross.
However, e.g. the call chain hfi1_file_ioctl() -> user_exp_rcv_setup()
-> hfi1_user_exp_rcv_setup() -> pin_rcv_pages() ->
hfi1_acquire_user_pages() -> pin_user_pages_fast() can end up
traversing VMAs without holding any ->mm_users reference, as far as I
can tell. That will probably result in kernel memory corruption if
that races the wrong way with an exiting task (with the ioctl() call
coming from a task whose ->mm is different from fd->mm).
It looks like this path should be using current and storing the grab'd
mm in the tidbuf for later use by hfi1_release_user_pages()
The only other use of file->mm is to setup a notifier, but this is
also under hfi1_user_exp_rcv_setup() so it should just use tidbuf->mm
== current anyhow.
The pq->mm looks similar, looks like the pq should use current->mm,
and it sets up an old-style notifier, but I didn't check carefully if
all the call paths are linked back to an ioctl..
It doesn't make sense that a RDMA driver would do any page pinning
outside an ioctl, so it should always use current.
I sort of recall a bug where we were trusting current and it wasn't
correct. I'll have to see if I can dig up the details and figure out
what's going on here.
Disclaimer: I haven't actually tested this - I just stumbled over it
while working on some other stuff, and I don't have any infiniband
hardware to test with. So it might well be that I just missed an
mmget_not_zero() somewhere, or something like that.
It looks wrong to me too.
Dennis?
I'll look at it closer and either send a patch or an explanation. Thanks
for bringing it to our attention Jann!
-Denny