[PATCH rdma-next v1 0/3] Introduce new advise MR verb

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Leon Romanovsky <leonro@xxxxxxxxxxxx>

Changelog v0->v1:
 * Fixed commit message in patch 2
 * Removed redundant brackets
 * Add FIXME comment
 * Flush workqueue to ensure no work is executed during ib_device dereg
 * Change declaration of sg_list ot be flex array
 * Fix rebase error

---------------------
Hi,

In this series from Moni, we are implementing the new advise_mr()
verb, which was proposed as RFC [1].

The verb advise_mr() borrows its definition from the system call
madvise() by giving an advice to the driver about an address range
that belongs to a memory region (MR), in opposite to madvise() which
operates on addresses and has different logical semantics not suitable
for MRs.

This verb is used by applications to tell the kernel about expected
memory usage to efficiently prepare it in advance, prior any following
usage. Like with madvise(), the advise_mr verb does not interfere
the semantics of the application, but can improve application performance.

Being an advice, the kernel is free to ignore advise_mr() calls.

Important example of such performance improvement hint is partial
pre-fetching of an ODP MRs.

Such pre-fetched ODP address ensure that range is exist before the actual
IO is conducted. This would provide a way to reduce latency by overlapping
paging-in and either compute time or IO to other ranges.

Thanks

[1] https://www.spinics.net/lists/linux-rdma/msg70592.html

---
This series has merge conflict with commit: 4d5422a309de
("IB/mlx5: Skip non-ODP MR when handling a page fault") in rdma-rc.

The resolution is as follow:
1. It is an error to ask "prefetch" for non-ODP MRs, because it came from explicit request.
2. It is OK to have non-ODP MRs in page-faults.

+               if (prefetch && !mr->umem->is_odp) {
+                       ret = -EINVAL;
+                       goto srcu_unlock;
+               }
+
 +              if (!mr->umem->is_odp) {
 +                      mlx5_ib_dbg(dev, "skipping non ODP MR (lkey=0x%06x) in page fault handler.\n",
 +                                  key);
 +                      if (bytes_mapped)
 +                              *bytes_mapped += bcnt;
 +                      ret = 0;
 +                      goto srcu_unlock;
 +              }

Moni Shoua (3):
  IB/uverbs: Add helper to get array size from ptr attribute
  IB/uverbs: Add support to advise_mr
  IB/mlx5: Add advise_mr() support

 drivers/infiniband/core/uverbs_std_types_mr.c |  56 ++++++++
 drivers/infiniband/hw/mlx5/flow.c             |  12 +-
 drivers/infiniband/hw/mlx5/main.c             |   8 ++
 drivers/infiniband/hw/mlx5/mlx5_ib.h          |  18 +++
 drivers/infiniband/hw/mlx5/mr.c               |  15 +++
 drivers/infiniband/hw/mlx5/odp.c              | 120 ++++++++++++++++--
 include/rdma/ib_verbs.h                       |   6 +
 include/rdma/uverbs_ioctl.h                   |  23 ++++
 include/uapi/rdma/ib_user_ioctl_cmds.h        |   8 ++
 include/uapi/rdma/ib_user_ioctl_verbs.h       |   9 ++
 10 files changed, 259 insertions(+), 16 deletions(-)

--
2.19.1




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux