RE: [PATCH for-next v4 0/8] On-Demand Paging on SoftRoCE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, April 20, 2023 1:07 AM Pearson, Robert B wrote:
> 
> The work queue patch has been submitted and is waiting for some action. -- Bob

Hi,
Could you tell me which is it? I am willing to review it.

This seems to be your latest work queue patch:
https://lore.kernel.org/all/TYCPR01MB8455A2D0B3303FD90B3BB6F1E58B9@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
I cannot find any one newer on the mailing list nor on the Patchwork.

Daisuke

> 
> -----Original Message-----
> From: Daisuke Matsuda <matsuda-daisuke@xxxxxxxxxxx>
> Sent: Wednesday, April 19, 2023 12:52 AM
> To: linux-rdma@xxxxxxxxxxxxxxx; leonro@xxxxxxxxxx; jgg@xxxxxxxxxx; zyjzyj2000@xxxxxxxxx
> Cc: linux-kernel@xxxxxxxxxxxxxxx; rpearsonhpe@xxxxxxxxx; yangx.jy@xxxxxxxxxxx; lizhijian@xxxxxxxxxxx; Daisuke
> Matsuda <matsuda-daisuke@xxxxxxxxxxx>
> Subject: [PATCH for-next v4 0/8] On-Demand Paging on SoftRoCE
> 
> This patch series implements the On-Demand Paging feature on SoftRoCE(rxe) driver, which has been available only in
> mlx5 driver[1] so far.
> 
> The first patch of this series is provided for testing purpose, and it should be dropped in the end. It converts triple tasklets
> to use workqueue in order to let them sleep during page-fault. Bob Pearson says he will post the patch to do this, and I
> think we can adopt that. The other patches in this series are, I believe, completed works.
> 
> I omitted some contents like the motive behind this series for simplicity.
> Please see the cover letter of v3 for more details[2].
> 
> [Overview]
> When applications register a memory region(MR), RDMA drivers normally pin pages in the MR so that physical addresses
> are never changed during RDMA communication. This requires the MR to fit in physical memory and inevitably leads to
> memory pressure. On the other hand, On-Demand Paging
> (ODP) allows applications to register MRs without pinning pages. They are paged-in when the driver requires and
> paged-out when the OS reclaims. As a result, it is possible to register a large MR that does not fit in physical memory
> without taking up so much physical memory.
> 
> [How does ODP work?]
> "struct ib_umem_odp" is used to manage pages. It is created for each ODP-enabled MR on its registration. This struct
> holds a pair of arrays
> (dma_list/pfn_list) that serve as a driver page table. DMA addresses and PFNs are stored in the driver page table. They
> are updated on page-in and page-out, both of which use the common interfaces in the ib_uverbs layer.
> 
> Page-in can occur when requester, responder or completer access an MR in order to process RDMA operations. If they
> find that the pages being accessed are not present on physical memory or requisite permissions are not set on the pages,
> they provoke page fault to make the pages present with proper permissions and at the same time update the driver page
> table.
> After confirming the presence of the pages, they execute memory access such as read, write or atomic operations.
> 
> Page-out is triggered by page reclaim or filesystem events (e.g. metadata update of a file that is being used as an MR).
> When creating an ODP-enabled MR, the driver registers an MMU notifier callback. When the kernel issues a page
> invalidation notification, the callback is provoked to unmap DMA addresses and update the driver page table. After that,
> the kernel releases the pages.
> 
> [Supported operations]
> All traditional operations are supported on RC connection. The new Atomic write[3] and RDMA Flush[4] operations are
> not included in this patchset. I will post them later after this patchset is merged. On UD connection, Send, Recv, and
> SRQ-Recv are supported.
> 
> [How to test ODP?]
> There are only a few resources available for testing. pyverbs testcases in rdma-core and perftest[5] are recommendable
> ones. Other than them, the ibv_rc_pingpong command can also used for testing. Note that you may have to build perftest
> from upstream because older versions do not handle ODP capabilities correctly.
> 
> The tree is available from github:
> https://github.com/daimatsuda/linux/tree/odp_v4
> While this series is based on commit f605f26ea196, the tree includes an additional bugfix, which is yet to be merged as of
> today (Apr 19th, 2023).
> https://lore.kernel.org/linux-rdma/20230418090642.1849358-1-matsuda-daisuke@xxxxxxxxxxx/
> 
> [Future work]
> My next work is to enable the new Atomic write[3] and RDMA Flush[4] operations with ODP. After that, I am going to
> implement the prefetch feature. It allows applications to trigger page fault using
> ibv_advise_mr(3) to optimize performance. Some existing software like librpma[6] use this feature. Additionally, I think we
> can also add the implicit ODP feature in the future.
> 
> [1] [RFC 00/20] On demand paging
> https://www.spinics.net/lists/linux-rdma/msg18906.html
> 
> [2] [PATCH for-next v3 0/7] On-Demand Paging on SoftRoCE
> https://lore.kernel.org/lkml/cover.1671772917.git.matsuda-daisuke@xxxxxxxxxxx/
> 
> [3] [PATCH v7 0/8] RDMA/rxe: Add atomic write operation
> https://lore.kernel.org/linux-rdma/1669905432-14-1-git-send-email-yangx.jy@xxxxxxxxxxx/
> 
> [4] [for-next PATCH 00/10] RDMA/rxe: Add RDMA FLUSH operation
> https://lore.kernel.org/lkml/20221206130201.30986-1-lizhijian@xxxxxxxxxxx/
> 
> [5] linux-rdma/perftest: Infiniband Verbs Performance Tests https://github.com/linux-rdma/perftest
> 
> [6] librpma: Remote Persistent Memory Access Library https://github.com/pmem/rpma
> 
> v3->v4:
>  1) Re-designed functions that access MRs to use the MR xarray.
>  2) Rebased onto the latest jgg-for-next tree.
> 
> v2->v3:
>  1) Removed a patch that changes the common ib_uverbs layer.
>  2) Re-implemented patches for conversion to workqueue.
>  3) Fixed compile errors (happened when CONFIG_INFINIBAND_ON_DEMAND_PAGING=n).
>  4) Fixed some functions that returned incorrect errors.
>  5) Temporarily disabled ODP for RDMA Flush and Atomic Write.
> 
> v1->v2:
>  1) Fixed a crash issue reported by Haris Iqbal.
>  2) Tried to make lock patters clearer as pointed out by Romanovsky.
>  3) Minor clean ups and fixes.
> 
> Daisuke Matsuda (8):
>   RDMA/rxe: Tentative workqueue implementation
>   RDMA/rxe: Always schedule works before accessing user MRs
>   RDMA/rxe: Make MR functions accessible from other rxe source code
>   RDMA/rxe: Move resp_states definition to rxe_verbs.h
>   RDMA/rxe: Add page invalidation support
>   RDMA/rxe: Allow registering MRs for On-Demand Paging
>   RDMA/rxe: Add support for Send/Recv/Write/Read with ODP
>   RDMA/rxe: Add support for the traditional Atomic operations with ODP
> 
>  drivers/infiniband/sw/rxe/Makefile    |   2 +
>  drivers/infiniband/sw/rxe/rxe.c       |  27 ++-
>  drivers/infiniband/sw/rxe/rxe.h       |  37 ---
>  drivers/infiniband/sw/rxe/rxe_comp.c  |  12 +-
>  drivers/infiniband/sw/rxe/rxe_loc.h   |  49 +++-
>  drivers/infiniband/sw/rxe/rxe_mr.c    |  27 +--
>  drivers/infiniband/sw/rxe/rxe_odp.c   | 311 ++++++++++++++++++++++++++
>  drivers/infiniband/sw/rxe/rxe_recv.c  |   4 +-
>  drivers/infiniband/sw/rxe/rxe_resp.c  |  32 ++-  drivers/infiniband/sw/rxe/rxe_task.c  |  84 ++++---
>  drivers/infiniband/sw/rxe/rxe_task.h  |   6 +-
>  drivers/infiniband/sw/rxe/rxe_verbs.c |   5 +-
>  drivers/infiniband/sw/rxe/rxe_verbs.h |  39 ++++
>  13 files changed, 535 insertions(+), 100 deletions(-)  create mode 100644 drivers/infiniband/sw/rxe/rxe_odp.c
> 
> base-commit: f605f26ea196a3b49bea249330cbd18dba61a33e
> 
> --
> 2.39.1





[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux