On Wed, Nov 11, 2020 at 09:58:37PM -0500, Dennis Dalessandro wrote: > Two earlier bug fixes have created a security problem in the hfi1 > driver. One fix aimed to solve an issue where current->mm was not valid > when closing the hfi1 cdev. It attempted to do this by saving a cached > value of the current->mm pointer at file open time. This is a problem if > another process with access to the FD calls in via write() or ioctl() to > pin pages via the hfi driver. The other fix tried to solve a use after > free by taking a reference on the mm. This was just wrong because its > possible for a race condition between one process with an mm that opened > the cdev if it was accessing via an IOCTL, and another process > attempting to close the cdev with a different current->mm. Again I'm still not seeing the race here. It is entirely possible that the fix I was trying to do way back was mistaken too... ;-) I would just delete the last 2 sentences... and/or reference the commit of those fixes and help explain this more. > > To fix this correctly we move the cached value of the mm into the mmu > handler struct for the driver. Looking at this closer I don't think you need the mm member of mmu_rb_handler any longer. See below. > Now we can check in the insert, evict, > etc. routines that current->mm matched what the handler was registered > for. If not, then don't allow access. The register of the mmu notifier > will save the mm pointer. > > Note the check in the unregister is not needed in the event that > current->mm is empty. This means the tear down is happening due to a > SigKill or OOM Killer, something along those lines. If current->mm has a > value then it must be checked and only the task that did the register > can do the unregister. > > Since in do_exit() the exit_mm() is called before exit_files(), which > would call our close routine a reference is needed on the mm. We rely on > the mmgrab done by the registration of the notifier, whereas before it > was explicit. > > Also of note is we do not do any explicit work to protect the interval > tree notifier. It doesn't seem that this is going to be needed since we > aren't actually doing anything with current->mm. The interval tree > notifier stuff still has a FIXME noted from a previous commit that will > be addressed in a follow on patch. This is a bit confusing ... Is this the FIXME you are refering to? hfi1/user_exp_rcv.c: ... 764 /* 765 * FIXME: This is in the wrong order, the notifier should be 766 * established before the pages are pinned by pin_rcv_pages. 767 */ ... > > Fixes: e0cf75deab81 ("IB/hfi1: Fix mm_struct use after free") > Fixes: 3faa3d9a308e ("IB/hfi1: Make use of mm consistent") > Reported-by: Jann Horn <jannh@xxxxxxxxxx> > Reported-by: Jason Gunthorpe <jgg@xxxxxxxxxx> > Cc: Ira Weiny <ira.weiny@xxxxxxxxx> > Reviewed-by: Mike Marciniszyn <mike.marciniszyn@xxxxxxxxxxxxxxxxxxxx> > Signed-off-by: Dennis Dalessandro <dennis.dalessandro@xxxxxxxxxxxxxxxxxxxx> > > --- > > Changes since v0: > ---------------- > Removed the checking of the pid and limitation that > whatever task opens the dev is the only one that can do write() or > ioctl(). While this limitation is OK it doesn't appear to be strictly > necessary. > > Rebased on top of 5.10-rc1. Testing has been done on 5.9 due to a bug in > 5.10 that is being worked (separate issue). > > Changes since v1: > ---------------- > Remove explicit mmget/put to rely on the notifier register's mmgrab > instead. > > Fixed missing check in rb_unregister to only check current->mm if its > actually valid. > > Moved mm_from_tid_node to exp_rcv header and use it > --- > drivers/infiniband/hw/hfi1/file_ops.c | 4 +-- > drivers/infiniband/hw/hfi1/hfi.h | 2 + > drivers/infiniband/hw/hfi1/mmu_rb.c | 41 +++++++++++++++++------------ > drivers/infiniband/hw/hfi1/mmu_rb.h | 17 +++++++++++- > drivers/infiniband/hw/hfi1/user_exp_rcv.c | 12 ++++++-- > drivers/infiniband/hw/hfi1/user_exp_rcv.h | 6 ++++ > drivers/infiniband/hw/hfi1/user_sdma.c | 13 +++++---- > drivers/infiniband/hw/hfi1/user_sdma.h | 7 ++++- > 8 files changed, 69 insertions(+), 33 deletions(-) > > diff --git a/drivers/infiniband/hw/hfi1/file_ops.c b/drivers/infiniband/hw/hfi1/file_ops.c > index 8ca51e4..329ee4f 100644 > --- a/drivers/infiniband/hw/hfi1/file_ops.c > +++ b/drivers/infiniband/hw/hfi1/file_ops.c > @@ -1,4 +1,5 @@ > /* > + * Copyright(c) 2020 Cornelis Networks, Inc. > * Copyright(c) 2015-2020 Intel Corporation. > * > * This file is provided under a dual BSD/GPLv2 license. When using or > @@ -206,8 +207,6 @@ static int hfi1_file_open(struct inode *inode, struct file *fp) > spin_lock_init(&fd->tid_lock); > spin_lock_init(&fd->invalid_lock); > fd->rec_cpu_num = -1; /* no cpu affinity by default */ > - fd->mm = current->mm; > - mmgrab(fd->mm); > fd->dd = dd; > fp->private_data = fd; > return 0; > @@ -711,7 +710,6 @@ static int hfi1_file_close(struct inode *inode, struct file *fp) > > deallocate_ctxt(uctxt); > done: > - mmdrop(fdata->mm); > > if (atomic_dec_and_test(&dd->user_refcount)) > complete(&dd->user_comp); > diff --git a/drivers/infiniband/hw/hfi1/hfi.h b/drivers/infiniband/hw/hfi1/hfi.h > index b4c6bff..e09e824 100644 > --- a/drivers/infiniband/hw/hfi1/hfi.h > +++ b/drivers/infiniband/hw/hfi1/hfi.h > @@ -1,6 +1,7 @@ > #ifndef _HFI1_KERNEL_H > #define _HFI1_KERNEL_H > /* > + * Copyright(c) 2020 Cornelis Networks, Inc. > * Copyright(c) 2015-2020 Intel Corporation. > * > * This file is provided under a dual BSD/GPLv2 license. When using or > @@ -1451,7 +1452,6 @@ struct hfi1_filedata { > u32 invalid_tid_idx; > /* protect invalid_tids array and invalid_tid_idx */ > spinlock_t invalid_lock; > - struct mm_struct *mm; > }; > > extern struct xarray hfi1_dev_table; > diff --git a/drivers/infiniband/hw/hfi1/mmu_rb.c b/drivers/infiniband/hw/hfi1/mmu_rb.c > index 24ca17b..6be4e79 100644 > --- a/drivers/infiniband/hw/hfi1/mmu_rb.c > +++ b/drivers/infiniband/hw/hfi1/mmu_rb.c > @@ -1,4 +1,5 @@ > /* > + * Copyright(c) 2020 Cornelis Networks, Inc. > * Copyright(c) 2016 - 2017 Intel Corporation. > * > * This file is provided under a dual BSD/GPLv2 license. When using or > @@ -48,23 +49,11 @@ > #include <linux/rculist.h> > #include <linux/mmu_notifier.h> > #include <linux/interval_tree_generic.h> > +#include <linux/sched/mm.h> > > #include "mmu_rb.h" > #include "trace.h" > > -struct mmu_rb_handler { > - struct mmu_notifier mn; > - struct rb_root_cached root; > - void *ops_arg; > - spinlock_t lock; /* protect the RB tree */ > - struct mmu_rb_ops *ops; > - struct mm_struct *mm; > - struct list_head lru_list; > - struct work_struct del_work; > - struct list_head del_list; > - struct workqueue_struct *wq; > -}; > - > static unsigned long mmu_node_start(struct mmu_rb_node *); > static unsigned long mmu_node_last(struct mmu_rb_node *); > static int mmu_notifier_range_start(struct mmu_notifier *, > @@ -92,7 +81,7 @@ static unsigned long mmu_node_last(struct mmu_rb_node *node) > return PAGE_ALIGN(node->addr + node->len) - 1; > } > > -int hfi1_mmu_rb_register(void *ops_arg, struct mm_struct *mm, > +int hfi1_mmu_rb_register(void *ops_arg, > struct mmu_rb_ops *ops, > struct workqueue_struct *wq, > struct mmu_rb_handler **handler) > @@ -110,18 +99,19 @@ int hfi1_mmu_rb_register(void *ops_arg, struct mm_struct *mm, > INIT_HLIST_NODE(&handlr->mn.hlist); > spin_lock_init(&handlr->lock); > handlr->mn.ops = &mn_opts; > - handlr->mm = mm; NIT: I really think you should follow up with a spelling fix patch... Sorry just got frustrated greping for 'handler' and not finding this! ;-) > INIT_WORK(&handlr->del_work, handle_remove); > INIT_LIST_HEAD(&handlr->del_list); > INIT_LIST_HEAD(&handlr->lru_list); > handlr->wq = wq; > > - ret = mmu_notifier_register(&handlr->mn, handlr->mm); > + ret = mmu_notifier_register(&handlr->mn, current->mm); > if (ret) { > kfree(handlr); > return ret; > } > > + handlr->mm = current->mm; Sorry I did not catch this before but do you need to store this pointer? Is it not enough to check the ->mn.mm? ... I think that would also make it clear you are relying on the mmget() within the mmu_notifier_register() Because that is the reference you are using rather than having another reference here which could potentially be used wrongly in the future. > + > *handler = handlr; > return 0; > } > @@ -133,8 +123,11 @@ void hfi1_mmu_rb_unregister(struct mmu_rb_handler *handler) > unsigned long flags; > struct list_head del_list; > > + if (current->mm && (handler->mm != current->mm)) ^^^^^^^^^^^ handler->mn.mm? ... Like this? > + return; > + > /* Unregister first so we don't get any more notifications. */ > - mmu_notifier_unregister(&handler->mn, handler->mm); > + mmu_notifier_unregister(&handler->mn, handler->mn.mm); Here you use the mn.mm. It is the same right? > > /* > * Make sure the wq delete handler is finished running. It will not > @@ -166,6 +159,10 @@ int hfi1_mmu_rb_insert(struct mmu_rb_handler *handler, > int ret = 0; > > trace_hfi1_mmu_rb_insert(mnode->addr, mnode->len); > + > + if (current->mm != handler->mm) Ditto. > + return -EPERM; > + > spin_lock_irqsave(&handler->lock, flags); > node = __mmu_rb_search(handler, mnode->addr, mnode->len); > if (node) { > @@ -180,6 +177,7 @@ int hfi1_mmu_rb_insert(struct mmu_rb_handler *handler, > __mmu_int_rb_remove(mnode, &handler->root); > list_del(&mnode->list); /* remove from LRU list */ > } > + mnode->handler = handler; > unlock: > spin_unlock_irqrestore(&handler->lock, flags); > return ret; > @@ -217,6 +215,9 @@ bool hfi1_mmu_rb_remove_unless_exact(struct mmu_rb_handler *handler, > unsigned long flags; > bool ret = false; > > + if (current->mm != handler->mm) Ditto. > + return ret; > + > spin_lock_irqsave(&handler->lock, flags); > node = __mmu_rb_search(handler, addr, len); > if (node) { > @@ -239,6 +240,9 @@ void hfi1_mmu_rb_evict(struct mmu_rb_handler *handler, void *evict_arg) > unsigned long flags; > bool stop = false; > > + if (current->mm != handler->mm) Ditto. > + return; > + > INIT_LIST_HEAD(&del_list); > > spin_lock_irqsave(&handler->lock, flags); > @@ -272,6 +276,9 @@ void hfi1_mmu_rb_remove(struct mmu_rb_handler *handler, > { > unsigned long flags; > > + if (current->mm != handler->mm) Ditto. > + return; > + > /* Validity of handler and node pointers has been checked by caller. */ > trace_hfi1_mmu_rb_remove(node->addr, node->len); > spin_lock_irqsave(&handler->lock, flags); > diff --git a/drivers/infiniband/hw/hfi1/mmu_rb.h b/drivers/infiniband/hw/hfi1/mmu_rb.h > index f04cec1..e208618 100644 > --- a/drivers/infiniband/hw/hfi1/mmu_rb.h > +++ b/drivers/infiniband/hw/hfi1/mmu_rb.h > @@ -1,4 +1,5 @@ > /* > + * Copyright(c) 2020 Cornelis Networks, Inc. > * Copyright(c) 2016 Intel Corporation. > * > * This file is provided under a dual BSD/GPLv2 license. When using or > @@ -54,6 +55,7 @@ struct mmu_rb_node { > unsigned long len; > unsigned long __last; > struct rb_node node; > + struct mmu_rb_handler *handler; > struct list_head list; > }; > > @@ -71,7 +73,20 @@ struct mmu_rb_ops { > void *evict_arg, bool *stop); > }; > > -int hfi1_mmu_rb_register(void *ops_arg, struct mm_struct *mm, > +struct mmu_rb_handler { > + struct mmu_notifier mn; > + struct rb_root_cached root; > + void *ops_arg; > + spinlock_t lock; /* protect the RB tree */ > + struct mmu_rb_ops *ops; > + struct list_head lru_list; > + struct work_struct del_work; > + struct list_head del_list; > + struct workqueue_struct *wq; > + struct mm_struct *mm; And remove this? Ira > +}; > + > +int hfi1_mmu_rb_register(void *ops_arg, > struct mmu_rb_ops *ops, > struct workqueue_struct *wq, > struct mmu_rb_handler **handler); > diff --git a/drivers/infiniband/hw/hfi1/user_exp_rcv.c b/drivers/infiniband/hw/hfi1/user_exp_rcv.c > index f81ca20..b94fc7f 100644 > --- a/drivers/infiniband/hw/hfi1/user_exp_rcv.c > +++ b/drivers/infiniband/hw/hfi1/user_exp_rcv.c > @@ -1,4 +1,5 @@ > /* > + * Copyright(c) 2020 Cornelis Networks, Inc. > * Copyright(c) 2015-2018 Intel Corporation. > * > * This file is provided under a dual BSD/GPLv2 license. When using or > @@ -173,15 +174,18 @@ static void unpin_rcv_pages(struct hfi1_filedata *fd, > { > struct page **pages; > struct hfi1_devdata *dd = fd->uctxt->dd; > + struct mm_struct *mm; > > if (mapped) { > pci_unmap_single(dd->pcidev, node->dma_addr, > node->npages * PAGE_SIZE, PCI_DMA_FROMDEVICE); > pages = &node->pages[idx]; > + mm = mm_from_tid_node(node); > } else { > pages = &tidbuf->pages[idx]; > + mm = current->mm; > } > - hfi1_release_user_pages(fd->mm, pages, npages, mapped); > + hfi1_release_user_pages(mm, pages, npages, mapped); > fd->tid_n_pinned -= npages; > } > > @@ -216,12 +220,12 @@ static int pin_rcv_pages(struct hfi1_filedata *fd, struct tid_user_buf *tidbuf) > * pages, accept the amount pinned so far and program only that. > * User space knows how to deal with partially programmed buffers. > */ > - if (!hfi1_can_pin_pages(dd, fd->mm, fd->tid_n_pinned, npages)) { > + if (!hfi1_can_pin_pages(dd, current->mm, fd->tid_n_pinned, npages)) { > kfree(pages); > return -ENOMEM; > } > > - pinned = hfi1_acquire_user_pages(fd->mm, vaddr, npages, true, pages); > + pinned = hfi1_acquire_user_pages(current->mm, vaddr, npages, true, pages); > if (pinned <= 0) { > kfree(pages); > return pinned; > @@ -756,7 +760,7 @@ static int set_rcvarray_entry(struct hfi1_filedata *fd, > > if (fd->use_mn) { > ret = mmu_interval_notifier_insert( > - &node->notifier, fd->mm, > + &node->notifier, current->mm, > tbuf->vaddr + (pageidx * PAGE_SIZE), npages * PAGE_SIZE, > &tid_mn_ops); > if (ret) > diff --git a/drivers/infiniband/hw/hfi1/user_exp_rcv.h b/drivers/infiniband/hw/hfi1/user_exp_rcv.h > index 332abb4..d45c7b6 100644 > --- a/drivers/infiniband/hw/hfi1/user_exp_rcv.h > +++ b/drivers/infiniband/hw/hfi1/user_exp_rcv.h > @@ -1,6 +1,7 @@ > #ifndef _HFI1_USER_EXP_RCV_H > #define _HFI1_USER_EXP_RCV_H > /* > + * Copyright(c) 2020 - Cornelis Networks, Inc. > * Copyright(c) 2015 - 2017 Intel Corporation. > * > * This file is provided under a dual BSD/GPLv2 license. When using or > @@ -95,4 +96,9 @@ int hfi1_user_exp_rcv_clear(struct hfi1_filedata *fd, > int hfi1_user_exp_rcv_invalid(struct hfi1_filedata *fd, > struct hfi1_tid_info *tinfo); > > +static inline struct mm_struct *mm_from_tid_node(struct tid_rb_node *node) > +{ > + return node->notifier.mm; > +} > + > #endif /* _HFI1_USER_EXP_RCV_H */ > diff --git a/drivers/infiniband/hw/hfi1/user_sdma.c b/drivers/infiniband/hw/hfi1/user_sdma.c > index a92346e..4a4956f9 100644 > --- a/drivers/infiniband/hw/hfi1/user_sdma.c > +++ b/drivers/infiniband/hw/hfi1/user_sdma.c > @@ -1,4 +1,5 @@ > /* > + * Copyright(c) 2020 - Cornelis Networks, Inc. > * Copyright(c) 2015 - 2018 Intel Corporation. > * > * This file is provided under a dual BSD/GPLv2 license. When using or > @@ -188,7 +189,6 @@ int hfi1_user_sdma_alloc_queues(struct hfi1_ctxtdata *uctxt, > atomic_set(&pq->n_reqs, 0); > init_waitqueue_head(&pq->wait); > atomic_set(&pq->n_locked, 0); > - pq->mm = fd->mm; > > iowait_init(&pq->busy, 0, NULL, NULL, defer_packet_queue, > activate_packet_queue, NULL, NULL); > @@ -230,7 +230,7 @@ int hfi1_user_sdma_alloc_queues(struct hfi1_ctxtdata *uctxt, > > cq->nentries = hfi1_sdma_comp_ring_size; > > - ret = hfi1_mmu_rb_register(pq, pq->mm, &sdma_rb_ops, dd->pport->hfi1_wq, > + ret = hfi1_mmu_rb_register(pq, &sdma_rb_ops, dd->pport->hfi1_wq, > &pq->handler); > if (ret) { > dd_dev_err(dd, "Failed to register with MMU %d", ret); > @@ -980,13 +980,13 @@ static int pin_sdma_pages(struct user_sdma_request *req, > > npages -= node->npages; > retry: > - if (!hfi1_can_pin_pages(pq->dd, pq->mm, > + if (!hfi1_can_pin_pages(pq->dd, current->mm, > atomic_read(&pq->n_locked), npages)) { > cleared = sdma_cache_evict(pq, npages); > if (cleared >= npages) > goto retry; > } > - pinned = hfi1_acquire_user_pages(pq->mm, > + pinned = hfi1_acquire_user_pages(current->mm, > ((unsigned long)iovec->iov.iov_base + > (node->npages * PAGE_SIZE)), npages, 0, > pages + node->npages); > @@ -995,7 +995,7 @@ static int pin_sdma_pages(struct user_sdma_request *req, > return pinned; > } > if (pinned != npages) { > - unpin_vector_pages(pq->mm, pages, node->npages, pinned); > + unpin_vector_pages(current->mm, pages, node->npages, pinned); > return -EFAULT; > } > kfree(node->pages); > @@ -1008,7 +1008,8 @@ static int pin_sdma_pages(struct user_sdma_request *req, > static void unpin_sdma_pages(struct sdma_mmu_node *node) > { > if (node->npages) { > - unpin_vector_pages(node->pq->mm, node->pages, 0, node->npages); > + unpin_vector_pages(mm_from_sdma_node(node), node->pages, 0, > + node->npages); > atomic_sub(node->npages, &node->pq->n_locked); > } > } > diff --git a/drivers/infiniband/hw/hfi1/user_sdma.h b/drivers/infiniband/hw/hfi1/user_sdma.h > index 9972e0e..1e8c02f 100644 > --- a/drivers/infiniband/hw/hfi1/user_sdma.h > +++ b/drivers/infiniband/hw/hfi1/user_sdma.h > @@ -1,6 +1,7 @@ > #ifndef _HFI1_USER_SDMA_H > #define _HFI1_USER_SDMA_H > /* > + * Copyright(c) 2020 - Cornelis Networks, Inc. > * Copyright(c) 2015 - 2018 Intel Corporation. > * > * This file is provided under a dual BSD/GPLv2 license. When using or > @@ -133,7 +134,6 @@ struct hfi1_user_sdma_pkt_q { > unsigned long unpinned; > struct mmu_rb_handler *handler; > atomic_t n_locked; > - struct mm_struct *mm; > }; > > struct hfi1_user_sdma_comp_q { > @@ -250,4 +250,9 @@ int hfi1_user_sdma_process_request(struct hfi1_filedata *fd, > struct iovec *iovec, unsigned long dim, > unsigned long *count); > > +static inline struct mm_struct *mm_from_sdma_node(struct sdma_mmu_node *node) > +{ > + return node->rb.handler->mn.mm; > +} > + > #endif /* _HFI1_USER_SDMA_H */ >