On Wed, Oct 30, 2019 at 11:44:09AM +0200, Michal Kalderon wrote: > This patch series uses the doorbell overflow recovery mechanism > introduced in > commit 36907cd5cd72 ("qed: Add doorbell overflow recovery mechanism") > for rdma ( RoCE and iWARP ) > > The first six patches modify the core code to contain helper > functions for managing mmap_xa inserting, getting and freeing > entries. The code was based on the code from efa driver. > There is still an open discussion on whether we should take > this even further and make the entire mmap generic. Until a > decision is made, I only created the database API and modified > the efa, qedr, siw driver to use it. The functions are integrated > with the umap mechanism. > > The doorbell recovery code is based on the common code. > > rdma-core pull request #493 was closed for now, once kernel series is > accepted will be reopend. > > This series applies over the wip/jgg-for-next branch and not the > for-next since it contains the series: > RDMA/qedr: Fix memory leaks and synchronization > https://www.spinics.net/lists/linux-rdma/msg85242.html > > SIW driver was reviewed, tested and signed-off by Bernard Metzler. Since we are on v12 now, let us get this done. I added this diff to the series. Mostly just renaming, indenting, the notes I gave already (to all drivers) and probably a few other things I forgot Here is the series with everything reflowed hopefully properly: https://github.com/jgunthorpe/linux/commits/rdma_mmap Please let me know if I messed it up, otherwise I'll apply this in a few days. Thanks, Jason diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h index 355c59d2eaa35b..6be180eb8cb329 100644 --- a/drivers/infiniband/core/core_priv.h +++ b/drivers/infiniband/core/core_priv.h @@ -398,7 +398,4 @@ void rdma_umap_priv_init(struct rdma_umap_priv *priv, struct vm_area_struct *vma, struct rdma_user_mmap_entry *entry); -void rdma_user_mmap_entry_put(struct ib_ucontext *ucontext, - struct rdma_user_mmap_entry *entry); - #endif /* _CORE_PRIV_H */ diff --git a/drivers/infiniband/core/ib_core_uverbs.c b/drivers/infiniband/core/ib_core_uverbs.c index 88d9d47fb8adaa..6238842fd06402 100644 --- a/drivers/infiniband/core/ib_core_uverbs.c +++ b/drivers/infiniband/core/ib_core_uverbs.c @@ -11,14 +11,14 @@ /** * rdma_umap_priv_init() - Initialize the private data of a vma * + * @priv: The already allocated private data * @vma: The vm area struct that needs private data * @entry: entry into the mmap_xa that needs to be linked with * this vma * - * Each time we map IO memory into user space this keeps track - * of the mapping. When the device is hot-unplugged we 'zap' the - * mmaps in user space to point to the zero page and allow the - * hot unplug to proceed. + * Each time we map IO memory into user space this keeps track of the + * mapping. When the device is hot-unplugged we 'zap' the mmaps in user space + * to point to the zero page and allow the hot unplug to proceed. * * This is necessary for cases like PCI physical hot unplug as the actual BAR * memory may vanish after this and access to it from userspace could MCE. @@ -48,20 +48,21 @@ void rdma_umap_priv_init(struct rdma_umap_priv *priv, EXPORT_SYMBOL(rdma_umap_priv_init); /** - * rdma_user_mmap_io() - Map IO memory into a process. + * rdma_user_mmap_io() - Map IO memory into a process * * @ucontext: associated user context - * @vma: the vma related to the current mmap call. + * @vma: the vma related to the current mmap call * @pfn: pfn to map * @size: size to map * @prot: pgprot to use in remap call + * @entry: mmap_entry retrieved from rdma_user_mmap_entry_get(), or NULL + * if mmap_entry is not used by the driver * - * This is to be called by drivers as part of their mmap() - * functions if they wish to send something like PCI-E BAR - * memory to userspace. + * This is to be called by drivers as part of their mmap() functions if they + * wish to send something like PCI-E BAR memory to userspace. * - * Return -EINVAL on wrong flags or size, -EAGAIN on failure to - * map. 0 on success. + * Return -EINVAL on wrong flags or size, -EAGAIN on failure to map. 0 on + * success. */ int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma, unsigned long pfn, unsigned long size, pgprot_t prot, @@ -98,45 +99,46 @@ int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma, EXPORT_SYMBOL(rdma_user_mmap_io); /** - * rdma_user_mmap_entry_get() - Get an entry from the mmap_xa. + * rdma_user_mmap_entry_get_pgoff() - Get an entry from the mmap_xa * - * @ucontext: associated user context. - * @key: the key received from rdma_user_mmap_entry_insert which - * is provided by user as the address to map. - * @vma: the vma related to the current mmap call. + * @ucontext: associated user context + * @pgoff: The mmap offset >> PAGE_SHIFT * - * This function is called when a user tries to mmap a key it - * initially received from the driver. The key was created by - * the function rdma_user_mmap_entry_insert. - * This function increases the refcnt of the entry so that it won't - * be deleted from the xa in the meantime. + * This function is called when a user tries to mmap with an offset (returned + * by rdma_user_mmap_get_offset()) it initially received from the driver. The + * rdma_user_mmap_entry was created by the function + * rdma_user_mmap_entry_insert(). This function increases the refcnt of the + * entry so that it won't be deleted from the xarray in the meantime. * - * Return an entry if exists or NULL if there is no match. + * Return an reference to an entry if exists or NULL if there is no + * match. rdma_user_mmap_entry_put() must be called to put the reference. */ struct rdma_user_mmap_entry * -rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key, - struct vm_area_struct *vma) +rdma_user_mmap_entry_get_pgoff(struct ib_ucontext *ucontext, + unsigned long pgoff) { struct rdma_user_mmap_entry *entry; - u64 mmap_page; - mmap_page = key >> PAGE_SHIFT; - if (mmap_page > U32_MAX) + if (pgoff > U32_MAX) return NULL; xa_lock(&ucontext->mmap_xa); - entry = xa_load(&ucontext->mmap_xa, mmap_page); + entry = xa_load(&ucontext->mmap_xa, pgoff); - /* if refcount is zero, entry is already being deleted */ - if (!entry || entry->invalid || !kref_get_unless_zero(&entry->ref)) + /* + * If refcount is zero, entry is already being deleted, driver_removed + * indicates that the no further mmaps are possible and we waiting for + * the active VMAs to be closed. + */ + if (!entry || entry->start_pgoff != pgoff || entry->driver_removed || + !kref_get_unless_zero(&entry->ref)) goto err; xa_unlock(&ucontext->mmap_xa); - ibdev_dbg(ucontext->device, - "mmap: key[%#llx] npages[%#x] returned\n", - key, entry->npages); + ibdev_dbg(ucontext->device, "mmap: pgoff[%#lx] npages[%#zx] returned\n", + pgoff, entry->npages); return entry; @@ -144,25 +146,54 @@ rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key, xa_unlock(&ucontext->mmap_xa); return NULL; } +EXPORT_SYMBOL(rdma_user_mmap_entry_get_pgoff); + +/** + * rdma_user_mmap_entry_get() - Get an entry from the mmap_xa + * + * @ucontext: associated user context + * @vma: the vma being mmap'd into + * + * This function is like rdma_user_mmap_entry_get_pgoff() except that it also + * checks that the VMA is correct. + */ +struct rdma_user_mmap_entry * +rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, + struct vm_area_struct *vma) +{ + struct rdma_user_mmap_entry *entry; + + if (!(vma->vm_flags & VM_SHARED)) + return NULL; + entry = rdma_user_mmap_entry_get_pgoff(ucontext, vma->vm_pgoff); + if (!entry) + return NULL; + if (entry->npages * PAGE_SIZE != vma->vm_end - vma->vm_start) { + rdma_user_mmap_entry_put(entry); + return NULL; + } + return entry; +} EXPORT_SYMBOL(rdma_user_mmap_entry_get); -void rdma_user_mmap_entry_free(struct kref *kref) +static void rdma_user_mmap_entry_free(struct kref *kref) { struct rdma_user_mmap_entry *entry = container_of(kref, struct rdma_user_mmap_entry, ref); struct ib_ucontext *ucontext = entry->ucontext; unsigned long i; - /* need to erase all entries occupied by this single entry */ + /* + * Erase all entries occupied by this single entry, this is deferred + * until all VMA are closed so that the mmap offsets remain unique. + */ xa_lock(&ucontext->mmap_xa); for (i = 0; i < entry->npages; i++) - __xa_erase(&ucontext->mmap_xa, entry->mmap_page + i); + __xa_erase(&ucontext->mmap_xa, entry->start_pgoff + i); xa_unlock(&ucontext->mmap_xa); - ibdev_dbg(ucontext->device, - "mmap: key[%#llx] npages[%#x] removed\n", - rdma_user_mmap_get_key(entry), - entry->npages); + ibdev_dbg(ucontext->device, "mmap: pgoff[%#lx] npages[%#zx] removed\n", + entry->start_pgoff, entry->npages); if (ucontext->device->ops.mmap_free) ucontext->device->ops.mmap_free(entry); @@ -171,8 +202,7 @@ void rdma_user_mmap_entry_free(struct kref *kref) /** * rdma_user_mmap_entry_put() - Drop reference to the mmap entry * - * @ucontext: associated user context. - * @entry: an entry in the mmap_xa. + * @entry: an entry in the mmap_xa * * This function is called when the mapping is closed if it was * an io mapping or when the driver is done with the entry for @@ -181,8 +211,7 @@ void rdma_user_mmap_entry_free(struct kref *kref) * and entry is no longer needed. This function will erase the * entry and free it if its refcnt reaches zero. */ -void rdma_user_mmap_entry_put(struct ib_ucontext *ucontext, - struct rdma_user_mmap_entry *entry) +void rdma_user_mmap_entry_put(struct rdma_user_mmap_entry *entry) { kref_put(&entry->ref, rdma_user_mmap_entry_free); } @@ -190,36 +219,38 @@ EXPORT_SYMBOL(rdma_user_mmap_entry_put); /** * rdma_user_mmap_entry_remove() - Drop reference to entry and - * mark it as invalid. + * mark it as unmmapable * - * @ucontext: associated user context. * @entry: the entry to insert into the mmap_xa + * + * Drivers can call this to prevent userspace from creating more mappings for + * entry, however existing mmaps continue to exist and ops->mmap_free() will + * not be called until all user mmaps are destroyed. */ -void rdma_user_mmap_entry_remove(struct ib_ucontext *ucontext, - struct rdma_user_mmap_entry *entry) +void rdma_user_mmap_entry_remove(struct rdma_user_mmap_entry *entry) { if (!entry) return; - entry->invalid = true; + entry->driver_removed = true; kref_put(&entry->ref, rdma_user_mmap_entry_free); } EXPORT_SYMBOL(rdma_user_mmap_entry_remove); /** - * rdma_user_mmap_entry_insert() - Insert an entry to the mmap_xa. + * rdma_user_mmap_entry_insert() - Insert an entry to the mmap_xa * * @ucontext: associated user context. * @entry: the entry to insert into the mmap_xa * @length: length of the address that will be mmapped * * This function should be called by drivers that use the rdma_user_mmap - * interface for handling user mmapped addresses. The database is handled in - * the core and helper functions are provided to insert entries into the - * database and extract entries when the user calls mmap with the given key. - * The function allocates a unique key that should be provided to user, the user - * will use the key to retrieve information such as address to - * be mapped and how. + * interface for implementing their mmap syscall A database of mmap offsets is + * handled in the core and helper functions are provided to insert entries + * into the database and extract entries when the user calls mmap with the + * given offset. The function allocates a unique page offset that should be + * provided to user, the user will use the iffset to retrieve information such + * as address to be mapped and how. * * Return: 0 on success and -ENOMEM on failure */ @@ -230,7 +261,8 @@ int rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext, struct ib_uverbs_file *ufile = ucontext->ufile; XA_STATE(xas, &ucontext->mmap_xa, 0); u32 xa_first, xa_last, npages; - int err, i; + int err; + u32 i; if (!entry) return -EINVAL; @@ -238,10 +270,11 @@ int rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext, kref_init(&entry->ref); entry->ucontext = ucontext; - /* We want the whole allocation to be done without interruption - * from a different thread. The allocation requires finding a - * free range and storing. During the xa_insert the lock could be - * released, we don't want another thread taking the gap. + /* + * We want the whole allocation to be done without interruption from a + * different thread. The allocation requires finding a free range and + * storing. During the xa_insert the lock could be released, possibly + * allowing another thread to choose the same range. */ mutex_lock(&ufile->umap_lock); @@ -262,13 +295,13 @@ int rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext, if (check_add_overflow(xa_first, npages, &xa_last)) goto err_unlock; - /* Now look for the next present entry. If such doesn't - * exist, we found an empty range and can proceed + /* + * Now look for the next present entry. If an entry doesn't + * exist, we found an empty range and can proceed. */ xas_next_entry(&xas, xa_last - 1); if (xas.xa_node == XAS_BOUNDS || xas.xa_index >= xa_last) break; - /* o/w look for the next free entry */ } for (i = xa_first; i < xa_last; i++) { @@ -277,13 +310,16 @@ int rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext, goto err_undo; } - entry->mmap_page = xa_first; + /* + * Internally the kernel uses a page offset, in libc this is a byte + * offset. Drivers should not return pgoff to userspace. + */ + entry->start_pgoff = xa_first; xa_unlock(&ucontext->mmap_xa); - mutex_unlock(&ufile->umap_lock); - ibdev_dbg(ucontext->device, - "mmap: key[%#llx] npages[%#x] inserted\n", - rdma_user_mmap_get_key(entry), npages); + + ibdev_dbg(ucontext->device, "mmap: pgoff[%#lx] npages[%#x] inserted\n", + entry->start_pgoff, npages); return 0; diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c index dbe9bd3d389a28..9aa7ffc1d12a93 100644 --- a/drivers/infiniband/core/uverbs_main.c +++ b/drivers/infiniband/core/uverbs_main.c @@ -844,17 +844,15 @@ static void rdma_umap_close(struct vm_area_struct *vma) if (!priv) return; - if (priv->entry) { - rdma_user_mmap_entry_put(ufile->ucontext, priv->entry); - priv->entry = NULL; - } - /* * The vma holds a reference on the struct file that created it, which * in turn means that the ib_uverbs_file is guaranteed to exist at * this point. */ mutex_lock(&ufile->umap_lock); + if (priv->entry) + rdma_user_mmap_entry_put(priv->entry); + list_del(&priv->list); mutex_unlock(&ufile->umap_lock); kfree(priv); @@ -951,17 +949,15 @@ void uverbs_user_mmap_disassociate(struct ib_uverbs_file *ufile) if (vma->vm_mm != mm) continue; - - if (priv->entry) { - rdma_user_mmap_entry_put(ufile->ucontext, - priv->entry); - priv->entry = NULL; - } - list_del_init(&priv->list); zap_vma_ptes(vma, vma->vm_start, vma->vm_end - vma->vm_start); + + if (priv->entry) { + rdma_user_mmap_entry_put(priv->entry); + priv->entry = NULL; + } } mutex_unlock(&ufile->umap_lock); skip_mm: diff --git a/drivers/infiniband/hw/efa/efa.h b/drivers/infiniband/hw/efa/efa.h index 482d8acaad2c1d..2bda07078b9743 100644 --- a/drivers/infiniband/hw/efa/efa.h +++ b/drivers/infiniband/hw/efa/efa.h @@ -122,13 +122,6 @@ struct efa_ah { u8 id[EFA_GID_SIZE]; }; -struct efa_user_mmap_entry { - struct rdma_user_mmap_entry rdma_entry; - u64 address; - size_t length; - u8 mmap_flag; -}; - int efa_query_device(struct ib_device *ibdev, struct ib_device_attr *props, struct ib_udata *udata); diff --git a/drivers/infiniband/hw/efa/efa_verbs.c b/drivers/infiniband/hw/efa/efa_verbs.c index f39e24df29a55e..b242ea7a3bc868 100644 --- a/drivers/infiniband/hw/efa/efa_verbs.c +++ b/drivers/infiniband/hw/efa/efa_verbs.c @@ -23,6 +23,12 @@ enum { (BIT(EFA_ADMIN_FATAL_ERROR) | BIT(EFA_ADMIN_WARNING) | \ BIT(EFA_ADMIN_NOTIFICATION) | BIT(EFA_ADMIN_KEEP_ALIVE)) +struct efa_user_mmap_entry { + struct rdma_user_mmap_entry rdma_entry; + u64 address; + u8 mmap_flag; +}; + #define EFA_DEFINE_STATS(op) \ op(EFA_TX_BYTES, "tx_bytes") \ op(EFA_TX_PKTS, "tx_pkts") \ @@ -376,10 +382,10 @@ static int efa_destroy_qp_handle(struct efa_dev *dev, u32 qp_handle) static void efa_qp_user_mmap_entries_remove(struct efa_ucontext *uctx, struct efa_qp *qp) { - rdma_user_mmap_entry_remove(&uctx->ibucontext, qp->rq_mmap_entry); - rdma_user_mmap_entry_remove(&uctx->ibucontext, qp->rq_db_mmap_entry); - rdma_user_mmap_entry_remove(&uctx->ibucontext, qp->llq_desc_mmap_entry); - rdma_user_mmap_entry_remove(&uctx->ibucontext, qp->sq_db_mmap_entry); + rdma_user_mmap_entry_remove(qp->rq_mmap_entry); + rdma_user_mmap_entry_remove(qp->rq_db_mmap_entry); + rdma_user_mmap_entry_remove(qp->llq_desc_mmap_entry); + rdma_user_mmap_entry_remove(qp->sq_db_mmap_entry); } int efa_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata) @@ -412,7 +418,7 @@ int efa_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata) static struct rdma_user_mmap_entry* efa_user_mmap_entry_insert(struct ib_ucontext *ucontext, u64 address, size_t length, - u8 mmap_flag, u64 *key) + u8 mmap_flag, u64 *offset) { struct efa_user_mmap_entry *entry = kzalloc(sizeof(*entry), GFP_KERNEL); int err; @@ -421,7 +427,6 @@ efa_user_mmap_entry_insert(struct ib_ucontext *ucontext, return NULL; entry->address = address; - entry->length = length; entry->mmap_flag = mmap_flag; err = rdma_user_mmap_entry_insert(ucontext, &entry->rdma_entry, @@ -430,7 +435,7 @@ efa_user_mmap_entry_insert(struct ib_ucontext *ucontext, kfree(entry); return NULL; } - *key = rdma_user_mmap_get_key(&entry->rdma_entry); + *offset = rdma_user_mmap_get_offset(&entry->rdma_entry); return &entry->rdma_entry; } @@ -828,9 +833,6 @@ static int efa_destroy_cq_idx(struct efa_dev *dev, int cq_idx) void efa_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata) { - struct efa_ucontext *ucontext = rdma_udata_to_drv_context(udata, - struct efa_ucontext, ibucontext); - struct efa_dev *dev = to_edev(ibcq->device); struct efa_cq *cq = to_ecq(ibcq); @@ -841,8 +843,7 @@ void efa_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata) efa_destroy_cq_idx(dev, cq->cq_idx); dma_unmap_single(&dev->pdev->dev, cq->dma_addr, cq->size, DMA_FROM_DEVICE); - rdma_user_mmap_entry_remove(&ucontext->ibucontext, - cq->mmap_entry); + rdma_user_mmap_entry_remove(cq->mmap_entry); } static int cq_mmap_entries_setup(struct efa_dev *dev, struct efa_cq *cq, @@ -975,7 +976,7 @@ int efa_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr, return 0; err_remove_mmap: - rdma_user_mmap_entry_remove(&ucontext->ibucontext, cq->mmap_entry); + rdma_user_mmap_entry_remove(cq->mmap_entry); err_destroy_cq: efa_destroy_cq_idx(dev, cq->cq_idx); err_free_mapped: @@ -1541,12 +1542,12 @@ void efa_mmap_free(struct rdma_user_mmap_entry *rdma_entry) /* DMA mapping is already gone, now free the pages */ if (entry->mmap_flag == EFA_MMAP_DMA_PAGE) free_pages_exact(phys_to_virt(entry->address), - entry->length); + entry->rdma_entry.npages * PAGE_SIZE); kfree(entry); } static int __efa_mmap(struct efa_dev *dev, struct efa_ucontext *ucontext, - struct vm_area_struct *vma, u64 key, size_t length) + struct vm_area_struct *vma) { struct rdma_user_mmap_entry *rdma_entry; struct efa_user_mmap_entry *entry; @@ -1554,35 +1555,31 @@ static int __efa_mmap(struct efa_dev *dev, struct efa_ucontext *ucontext, int err = 0; u64 pfn; - rdma_entry = rdma_user_mmap_entry_get(&ucontext->ibucontext, key, - vma); + rdma_entry = rdma_user_mmap_entry_get(&ucontext->ibucontext, vma); if (!rdma_entry) { - ibdev_dbg(&dev->ibdev, "key[%#llx] does not have valid entry\n", - key); + ibdev_dbg(&dev->ibdev, + "pgoff[%#lx] does not have valid entry\n", + vma->vm_pgoff); return -EINVAL; } entry = to_emmap(rdma_entry); - if (entry->length != length) { - ibdev_dbg(&dev->ibdev, - "key[%#llx] does not have valid length[%#zx] expected[%#zx]\n", - key, length, entry->length); - err = -EINVAL; - goto out; - } ibdev_dbg(&dev->ibdev, "Mapping address[%#llx], length[%#zx], mmap_flag[%d]\n", - entry->address, entry->length, entry->mmap_flag); + entry->address, rdma_entry->npages * PAGE_SIZE, + entry->mmap_flag); pfn = entry->address >> PAGE_SHIFT; switch (entry->mmap_flag) { case EFA_MMAP_IO_NC: - err = rdma_user_mmap_io(&ucontext->ibucontext, vma, pfn, length, + err = rdma_user_mmap_io(&ucontext->ibucontext, vma, pfn, + entry->rdma_entry.npages * PAGE_SIZE, pgprot_noncached(vma->vm_page_prot), rdma_entry); break; case EFA_MMAP_IO_WC: - err = rdma_user_mmap_io(&ucontext->ibucontext, vma, pfn, length, + err = rdma_user_mmap_io(&ucontext->ibucontext, vma, pfn, + entry->rdma_entry.npages * PAGE_SIZE, pgprot_writecombine(vma->vm_page_prot), rdma_entry); break; @@ -1599,14 +1596,14 @@ static int __efa_mmap(struct efa_dev *dev, struct efa_ucontext *ucontext, } if (err) { - ibdev_dbg(&dev->ibdev, - "Couldn't mmap address[%#llx] length[%#zx] mmap_flag[%d] err[%d]\n", - entry->address, length, entry->mmap_flag, err); + ibdev_dbg( + &dev->ibdev, + "Couldn't mmap address[%#llx] length[%#zx] mmap_flag[%d] err[%d]\n", + entry->address, rdma_entry->npages * PAGE_SIZE, + entry->mmap_flag, err); } -out: - rdma_user_mmap_entry_put(&ucontext->ibucontext, rdma_entry); - + rdma_user_mmap_entry_put(rdma_entry); return err; } @@ -1616,25 +1613,12 @@ int efa_mmap(struct ib_ucontext *ibucontext, struct efa_ucontext *ucontext = to_eucontext(ibucontext); struct efa_dev *dev = to_edev(ibucontext->device); size_t length = vma->vm_end - vma->vm_start; - u64 key = vma->vm_pgoff << PAGE_SHIFT; ibdev_dbg(&dev->ibdev, - "start %#lx, end %#lx, length = %#zx, key = %#llx\n", - vma->vm_start, vma->vm_end, length, key); - - if (length % PAGE_SIZE != 0 || !(vma->vm_flags & VM_SHARED)) { - ibdev_dbg(&dev->ibdev, - "length[%#zx] is not page size aligned[%#lx] or VM_SHARED is not set [%#lx]\n", - length, PAGE_SIZE, vma->vm_flags); - return -EINVAL; - } - - if (vma->vm_flags & VM_EXEC) { - ibdev_dbg(&dev->ibdev, "Mapping executable pages is not permitted\n"); - return -EPERM; - } + "start %#lx, end %#lx, length = %#zx, pgoff = %#lx\n", + vma->vm_start, vma->vm_end, length, vma->vm_pgoff); - return __efa_mmap(dev, ucontext, vma, key, length); + return __efa_mmap(dev, ucontext, vma); } static int efa_ah_destroy(struct efa_dev *dev, struct efa_ah *ah) diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c index b67248c9d92c29..9a674e65c4f783 100644 --- a/drivers/infiniband/hw/qedr/verbs.c +++ b/drivers/infiniband/hw/qedr/verbs.c @@ -321,7 +321,7 @@ int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata) uresp.dpm_enabled = dev->user_dpm_enabled; uresp.wids_enabled = 1; uresp.wid_count = oparams.wid_count; - uresp.db_pa = rdma_user_mmap_get_key(ctx->db_mmap_entry); + uresp.db_pa = rdma_user_mmap_get_offset(ctx->db_mmap_entry); uresp.db_size = ctx->dpi_size; uresp.max_send_wr = dev->attr.max_sqe; uresp.max_recv_wr = dev->attr.max_rqe; @@ -345,7 +345,7 @@ int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata) if (!ctx->db_mmap_entry) dev->ops->rdma_remove_user(dev->rdma_ctx, ctx->dpi); else - rdma_user_mmap_entry_remove(uctx, ctx->db_mmap_entry); + rdma_user_mmap_entry_remove(ctx->db_mmap_entry); return rc; } @@ -357,7 +357,7 @@ void qedr_dealloc_ucontext(struct ib_ucontext *ibctx) DP_DEBUG(uctx->dev, QEDR_MSG_INIT, "Deallocating user context %p\n", uctx); - rdma_user_mmap_entry_remove(ibctx, uctx->db_mmap_entry); + rdma_user_mmap_entry_remove(uctx->db_mmap_entry); } void qedr_mmap_free(struct rdma_user_mmap_entry *rdma_entry) @@ -377,44 +377,22 @@ int qedr_mmap(struct ib_ucontext *ucontext, struct vm_area_struct *vma) { struct ib_device *dev = ucontext->device; size_t length = vma->vm_end - vma->vm_start; - u64 key = vma->vm_pgoff << PAGE_SHIFT; struct rdma_user_mmap_entry *rdma_entry; struct qedr_user_mmap_entry *entry; int rc = 0; u64 pfn; ibdev_dbg(dev, - "start %#lx, end %#lx, length = %#zx, key = %#llx\n", - vma->vm_start, vma->vm_end, length, key); + "start %#lx, end %#lx, length = %#zx, pgoff = %#lx\n", + vma->vm_start, vma->vm_end, length, vma->vm_pgoff); - if (length % PAGE_SIZE != 0 || !(vma->vm_flags & VM_SHARED)) { - ibdev_dbg(dev, - "length[%#zx] is not page size aligned[%#lx] or VM_SHARED is not set [%#lx]\n", - length, PAGE_SIZE, vma->vm_flags); - return -EINVAL; - } - - if (vma->vm_flags & VM_EXEC) { - ibdev_dbg(dev, "Mapping executable pages is not permitted\n"); - return -EPERM; - } - vma->vm_flags &= ~VM_MAYEXEC; - - rdma_entry = rdma_user_mmap_entry_get(ucontext, key, vma); + rdma_entry = rdma_user_mmap_entry_get(ucontext, vma); if (!rdma_entry) { - ibdev_dbg(dev, "key[%#llx] does not have valid entry\n", - key); + ibdev_dbg(dev, "pgoff[%#lx] does not have valid entry\n", + vma->vm_pgoff); return -EINVAL; } entry = get_qedr_mmap_entry(rdma_entry); - if (entry->length != length) { - ibdev_dbg(dev, - "key[%#llx] does not have valid length[%#zx] expected[%#zx]\n", - key, length, entry->length); - rc = -EINVAL; - goto out; - } - ibdev_dbg(dev, "Mapping address[%#llx], length[%#zx], mmap_flag[%d]\n", entry->io_address, length, entry->mmap_flag); @@ -439,9 +417,7 @@ int qedr_mmap(struct ib_ucontext *ucontext, struct vm_area_struct *vma) "Couldn't mmap address[%#llx] length[%#zx] mmap_flag[%d] err[%d]\n", entry->io_address, length, entry->mmap_flag, rc); -out: - rdma_user_mmap_entry_put(ucontext, rdma_entry); - + rdma_user_mmap_entry_put(rdma_entry); return rc; } @@ -713,7 +689,7 @@ static int qedr_copy_cq_uresp(struct qedr_dev *dev, uresp.db_offset = db_offset; uresp.icid = cq->icid; - uresp.db_rec_addr = rdma_user_mmap_get_key(cq->q.db_mmap_entry); + uresp.db_rec_addr = rdma_user_mmap_get_offset(cq->q.db_mmap_entry); rc = qedr_ib_copy_to_udata(udata, &uresp, sizeof(uresp)); if (rc) @@ -758,7 +734,7 @@ static int qedr_init_user_db_rec(struct ib_udata *udata, /* Allocate a page for doorbell recovery, add to mmap */ q->db_rec_data = (void *)get_zeroed_page(GFP_USER); if (!q->db_rec_data) { - DP_ERR(dev, "get_free_page failed\n"); + DP_ERR(dev, "get_zeroed_page failed\n"); return -ENOMEM; } @@ -1044,8 +1020,7 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr, qedr_free_pbl(dev, &cq->q.pbl_info, cq->q.pbl_tbl); ib_umem_release(cq->q.umem); if (ctx) - rdma_user_mmap_entry_remove(&ctx->ibucontext, - cq->q.db_mmap_entry); + rdma_user_mmap_entry_remove(cq->q.db_mmap_entry); } else { dev->ops->common->chain_free(dev->cdev, &cq->pbl); } @@ -1068,8 +1043,6 @@ int qedr_resize_cq(struct ib_cq *ibcq, int new_cnt, struct ib_udata *udata) void qedr_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata) { - struct qedr_ucontext *ctx = rdma_udata_to_drv_context(udata, - struct qedr_ucontext, ibucontext); struct qedr_dev *dev = get_qedr_dev(ibcq->device); struct qed_rdma_destroy_cq_out_params oparams; struct qed_rdma_destroy_cq_in_params iparams; @@ -1097,8 +1070,7 @@ void qedr_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata) if (cq->q.db_rec_data) { qedr_db_recovery_del(dev, cq->q.db_addr, &cq->q.db_rec_data->db_data); - rdma_user_mmap_entry_remove(&ctx->ibucontext, - cq->q.db_mmap_entry); + rdma_user_mmap_entry_remove(cq->q.db_mmap_entry); } } else { qedr_db_recovery_del(dev, cq->db_addr, &cq->db.data); @@ -1280,7 +1252,7 @@ static void qedr_copy_rq_uresp(struct qedr_dev *dev, } uresp->rq_icid = qp->icid; - uresp->rq_db_rec_addr = rdma_user_mmap_get_key(qp->urq.db_mmap_entry); + uresp->rq_db_rec_addr = rdma_user_mmap_get_offset(qp->urq.db_mmap_entry); } static void qedr_copy_sq_uresp(struct qedr_dev *dev, @@ -1295,7 +1267,8 @@ static void qedr_copy_sq_uresp(struct qedr_dev *dev, else uresp->sq_icid = qp->icid + 1; - uresp->sq_db_rec_addr = rdma_user_mmap_get_key(qp->usq.db_mmap_entry); + uresp->sq_db_rec_addr = + rdma_user_mmap_get_offset(qp->usq.db_mmap_entry); } static int qedr_copy_qp_uresp(struct qedr_dev *dev, @@ -1747,15 +1720,13 @@ static void qedr_cleanup_user(struct qedr_dev *dev, if (qp->usq.db_rec_data) { qedr_db_recovery_del(dev, qp->usq.db_addr, &qp->usq.db_rec_data->db_data); - rdma_user_mmap_entry_remove(&ctx->ibucontext, - qp->usq.db_mmap_entry); + rdma_user_mmap_entry_remove(qp->usq.db_mmap_entry); } if (qp->urq.db_rec_data) { qedr_db_recovery_del(dev, qp->urq.db_addr, &qp->urq.db_rec_data->db_data); - rdma_user_mmap_entry_remove(&ctx->ibucontext, - qp->urq.db_mmap_entry); + rdma_user_mmap_entry_remove(qp->urq.db_mmap_entry); } if (rdma_protocol_iwarp(&dev->ibdev, 1)) diff --git a/drivers/infiniband/sw/siw/siw.h b/drivers/infiniband/sw/siw/siw.h index ecc338e8f0b09d..f851afb5632e65 100644 --- a/drivers/infiniband/sw/siw/siw.h +++ b/drivers/infiniband/sw/siw/siw.h @@ -507,7 +507,6 @@ struct iwarp_msg_info { struct siw_user_mmap_entry { struct rdma_user_mmap_entry rdma_entry; void *address; - size_t length; }; /* Global siw parameters. Currently set in siw_main.c */ diff --git a/drivers/infiniband/sw/siw/siw_verbs.c b/drivers/infiniband/sw/siw/siw_verbs.c index cc4e7aebc9bc09..725985ed8af3be 100644 --- a/drivers/infiniband/sw/siw/siw_verbs.c +++ b/drivers/infiniband/sw/siw/siw_verbs.c @@ -44,7 +44,6 @@ void siw_mmap_free(struct rdma_user_mmap_entry *rdma_entry) int siw_mmap(struct ib_ucontext *ctx, struct vm_area_struct *vma) { struct siw_ucontext *uctx = to_siw_ctx(ctx); - unsigned long off = vma->vm_pgoff << PAGE_SHIFT; size_t size = vma->vm_end - vma->vm_start; struct rdma_user_mmap_entry *rdma_entry; struct siw_user_mmap_entry *entry; @@ -57,29 +56,22 @@ int siw_mmap(struct ib_ucontext *ctx, struct vm_area_struct *vma) pr_warn("siw: mmap not page aligned\n"); return -EINVAL; } - rdma_entry = rdma_user_mmap_entry_get(&uctx->base_ucontext, off, - vma); + rdma_entry = rdma_user_mmap_entry_get(&uctx->base_ucontext, vma); if (!rdma_entry) { siw_dbg(&uctx->sdev->base_dev, "mmap lookup failed: %lu, %#zx\n", - off, size); + vma->vm_pgoff, size); return -EINVAL; } entry = to_siw_mmap_entry(rdma_entry); - if (PAGE_ALIGN(entry->length) != size) { - siw_dbg(&uctx->sdev->base_dev, - "key[%#lx] does not have valid length[%#zx] expected[%#zx]\n", - off, size, entry->length); - rv = -EINVAL; - goto out; - } rv = remap_vmalloc_range(vma, entry->address, 0); if (rv) { - pr_warn("remap_vmalloc_range failed: %lu, %zu\n", off, size); + pr_warn("remap_vmalloc_range failed: %lu, %zu\n", vma->vm_pgoff, + size); goto out; } out: - rdma_user_mmap_entry_put(&uctx->base_ucontext, rdma_entry); + rdma_user_mmap_entry_put(rdma_entry); return rv; } @@ -274,17 +266,16 @@ void siw_qp_put_ref(struct ib_qp *base_qp) static struct rdma_user_mmap_entry * siw_mmap_entry_insert(struct siw_ucontext *uctx, void *address, size_t length, - u64 *key) + u64 *offset) { struct siw_user_mmap_entry *entry = kzalloc(sizeof(*entry), GFP_KERNEL); int rv; - *key = SIW_INVAL_UOBJ_KEY; + *offset = SIW_INVAL_UOBJ_KEY; if (!entry) return NULL; entry->address = address; - entry->length = length; rv = rdma_user_mmap_entry_insert(&uctx->base_ucontext, &entry->rdma_entry, @@ -294,7 +285,7 @@ siw_mmap_entry_insert(struct siw_ucontext *uctx, return NULL; } - *key = rdma_user_mmap_get_key(&entry->rdma_entry); + *offset = rdma_user_mmap_get_offset(&entry->rdma_entry); return &entry->rdma_entry; } @@ -512,10 +503,8 @@ struct ib_qp *siw_create_qp(struct ib_pd *pd, if (qp) { if (uctx) { - rdma_user_mmap_entry_remove(&uctx->base_ucontext, - qp->sq_entry); - rdma_user_mmap_entry_remove(&uctx->base_ucontext, - qp->rq_entry); + rdma_user_mmap_entry_remove(qp->sq_entry); + rdma_user_mmap_entry_remove(qp->rq_entry); } vfree(qp->sendq); vfree(qp->recvq); @@ -631,8 +620,8 @@ int siw_destroy_qp(struct ib_qp *base_qp, struct ib_udata *udata) qp->rx_stream.rx_suspend = 1; if (uctx) { - rdma_user_mmap_entry_remove(&uctx->base_ucontext, qp->sq_entry); - rdma_user_mmap_entry_remove(&uctx->base_ucontext, qp->rq_entry); + rdma_user_mmap_entry_remove(qp->sq_entry); + rdma_user_mmap_entry_remove(qp->rq_entry); } down_write(&qp->state_lock); @@ -1104,7 +1093,7 @@ void siw_destroy_cq(struct ib_cq *base_cq, struct ib_udata *udata) siw_cq_flush(cq); if (ctx) - rdma_user_mmap_entry_remove(&ctx->base_ucontext, cq->cq_entry); + rdma_user_mmap_entry_remove(cq->cq_entry); atomic_dec(&sdev->num_cq); @@ -1198,8 +1187,7 @@ int siw_create_cq(struct ib_cq *base_cq, const struct ib_cq_init_attr *attr, rdma_udata_to_drv_context(udata, struct siw_ucontext, base_ucontext); if (ctx) - rdma_user_mmap_entry_remove(&ctx->base_ucontext, - cq->cq_entry); + rdma_user_mmap_entry_remove(cq->cq_entry); vfree(cq->queue); } atomic_dec(&sdev->num_cq); @@ -1650,8 +1638,7 @@ int siw_create_srq(struct ib_srq *base_srq, err_out: if (srq->recvq) { if (ctx) - rdma_user_mmap_entry_remove(&ctx->base_ucontext, - srq->srq_entry); + rdma_user_mmap_entry_remove(srq->srq_entry); vfree(srq->recvq); } atomic_dec(&sdev->num_srq); @@ -1738,8 +1725,7 @@ void siw_destroy_srq(struct ib_srq *base_srq, struct ib_udata *udata) base_ucontext); if (ctx) - rdma_user_mmap_entry_remove(&ctx->base_ucontext, - srq->srq_entry); + rdma_user_mmap_entry_remove(srq->srq_entry); vfree(srq->recvq); atomic_dec(&sdev->num_srq); } diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 13de7c1b796d7d..416e72ea80d913 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -2262,15 +2262,16 @@ struct iw_cm_conn_param; struct rdma_user_mmap_entry { struct kref ref; struct ib_ucontext *ucontext; - u32 npages; - u32 mmap_page; - bool invalid; + unsigned long start_pgoff; + size_t npages; + bool driver_removed; }; +/* Return the offset (in bytes) the user should pass to libc's mmap() */ static inline u64 -rdma_user_mmap_get_key(const struct rdma_user_mmap_entry *entry) +rdma_user_mmap_get_offset(const struct rdma_user_mmap_entry *entry) { - return (u64)entry->mmap_page << PAGE_SHIFT; + return (u64)entry->start_pgoff << PAGE_SHIFT; } /** @@ -2832,14 +2833,14 @@ int rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext, struct rdma_user_mmap_entry *entry, size_t length); struct rdma_user_mmap_entry * -rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key, +rdma_user_mmap_entry_get_pgoff(struct ib_ucontext *ucontext, + unsigned long pgoff); +struct rdma_user_mmap_entry * +rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, struct vm_area_struct *vma); +void rdma_user_mmap_entry_put(struct rdma_user_mmap_entry *entry); -void rdma_user_mmap_entry_put(struct ib_ucontext *ucontext, - struct rdma_user_mmap_entry *entry); - -void rdma_user_mmap_entry_remove(struct ib_ucontext *ucontext, - struct rdma_user_mmap_entry *entry); +void rdma_user_mmap_entry_remove(struct rdma_user_mmap_entry *entry); static inline int ib_copy_from_udata(void *dest, struct ib_udata *udata, size_t len) {