On Thu, April 20, 2023 1:07 AM Pearson, Robert B wrote: > > The work queue patch has been submitted and is waiting for some action. -- Bob Hi, Could you tell me which is it? I am willing to review it. This seems to be your latest work queue patch: https://lore.kernel.org/all/TYCPR01MB8455A2D0B3303FD90B3BB6F1E58B9@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ I cannot find any one newer on the mailing list nor on the Patchwork. Daisuke > > -----Original Message----- > From: Daisuke Matsuda <matsuda-daisuke@xxxxxxxxxxx> > Sent: Wednesday, April 19, 2023 12:52 AM > To: linux-rdma@xxxxxxxxxxxxxxx; leonro@xxxxxxxxxx; jgg@xxxxxxxxxx; zyjzyj2000@xxxxxxxxx > Cc: linux-kernel@xxxxxxxxxxxxxxx; rpearsonhpe@xxxxxxxxx; yangx.jy@xxxxxxxxxxx; lizhijian@xxxxxxxxxxx; Daisuke > Matsuda <matsuda-daisuke@xxxxxxxxxxx> > Subject: [PATCH for-next v4 0/8] On-Demand Paging on SoftRoCE > > This patch series implements the On-Demand Paging feature on SoftRoCE(rxe) driver, which has been available only in > mlx5 driver[1] so far. > > The first patch of this series is provided for testing purpose, and it should be dropped in the end. It converts triple tasklets > to use workqueue in order to let them sleep during page-fault. Bob Pearson says he will post the patch to do this, and I > think we can adopt that. The other patches in this series are, I believe, completed works. > > I omitted some contents like the motive behind this series for simplicity. > Please see the cover letter of v3 for more details[2]. > > [Overview] > When applications register a memory region(MR), RDMA drivers normally pin pages in the MR so that physical addresses > are never changed during RDMA communication. This requires the MR to fit in physical memory and inevitably leads to > memory pressure. On the other hand, On-Demand Paging > (ODP) allows applications to register MRs without pinning pages. They are paged-in when the driver requires and > paged-out when the OS reclaims. As a result, it is possible to register a large MR that does not fit in physical memory > without taking up so much physical memory. > > [How does ODP work?] > "struct ib_umem_odp" is used to manage pages. It is created for each ODP-enabled MR on its registration. This struct > holds a pair of arrays > (dma_list/pfn_list) that serve as a driver page table. DMA addresses and PFNs are stored in the driver page table. They > are updated on page-in and page-out, both of which use the common interfaces in the ib_uverbs layer. > > Page-in can occur when requester, responder or completer access an MR in order to process RDMA operations. If they > find that the pages being accessed are not present on physical memory or requisite permissions are not set on the pages, > they provoke page fault to make the pages present with proper permissions and at the same time update the driver page > table. > After confirming the presence of the pages, they execute memory access such as read, write or atomic operations. > > Page-out is triggered by page reclaim or filesystem events (e.g. metadata update of a file that is being used as an MR). > When creating an ODP-enabled MR, the driver registers an MMU notifier callback. When the kernel issues a page > invalidation notification, the callback is provoked to unmap DMA addresses and update the driver page table. After that, > the kernel releases the pages. > > [Supported operations] > All traditional operations are supported on RC connection. The new Atomic write[3] and RDMA Flush[4] operations are > not included in this patchset. I will post them later after this patchset is merged. On UD connection, Send, Recv, and > SRQ-Recv are supported. > > [How to test ODP?] > There are only a few resources available for testing. pyverbs testcases in rdma-core and perftest[5] are recommendable > ones. Other than them, the ibv_rc_pingpong command can also used for testing. Note that you may have to build perftest > from upstream because older versions do not handle ODP capabilities correctly. > > The tree is available from github: > https://github.com/daimatsuda/linux/tree/odp_v4 > While this series is based on commit f605f26ea196, the tree includes an additional bugfix, which is yet to be merged as of > today (Apr 19th, 2023). > https://lore.kernel.org/linux-rdma/20230418090642.1849358-1-matsuda-daisuke@xxxxxxxxxxx/ > > [Future work] > My next work is to enable the new Atomic write[3] and RDMA Flush[4] operations with ODP. After that, I am going to > implement the prefetch feature. It allows applications to trigger page fault using > ibv_advise_mr(3) to optimize performance. Some existing software like librpma[6] use this feature. Additionally, I think we > can also add the implicit ODP feature in the future. > > [1] [RFC 00/20] On demand paging > https://www.spinics.net/lists/linux-rdma/msg18906.html > > [2] [PATCH for-next v3 0/7] On-Demand Paging on SoftRoCE > https://lore.kernel.org/lkml/cover.1671772917.git.matsuda-daisuke@xxxxxxxxxxx/ > > [3] [PATCH v7 0/8] RDMA/rxe: Add atomic write operation > https://lore.kernel.org/linux-rdma/1669905432-14-1-git-send-email-yangx.jy@xxxxxxxxxxx/ > > [4] [for-next PATCH 00/10] RDMA/rxe: Add RDMA FLUSH operation > https://lore.kernel.org/lkml/20221206130201.30986-1-lizhijian@xxxxxxxxxxx/ > > [5] linux-rdma/perftest: Infiniband Verbs Performance Tests https://github.com/linux-rdma/perftest > > [6] librpma: Remote Persistent Memory Access Library https://github.com/pmem/rpma > > v3->v4: > 1) Re-designed functions that access MRs to use the MR xarray. > 2) Rebased onto the latest jgg-for-next tree. > > v2->v3: > 1) Removed a patch that changes the common ib_uverbs layer. > 2) Re-implemented patches for conversion to workqueue. > 3) Fixed compile errors (happened when CONFIG_INFINIBAND_ON_DEMAND_PAGING=n). > 4) Fixed some functions that returned incorrect errors. > 5) Temporarily disabled ODP for RDMA Flush and Atomic Write. > > v1->v2: > 1) Fixed a crash issue reported by Haris Iqbal. > 2) Tried to make lock patters clearer as pointed out by Romanovsky. > 3) Minor clean ups and fixes. > > Daisuke Matsuda (8): > RDMA/rxe: Tentative workqueue implementation > RDMA/rxe: Always schedule works before accessing user MRs > RDMA/rxe: Make MR functions accessible from other rxe source code > RDMA/rxe: Move resp_states definition to rxe_verbs.h > RDMA/rxe: Add page invalidation support > RDMA/rxe: Allow registering MRs for On-Demand Paging > RDMA/rxe: Add support for Send/Recv/Write/Read with ODP > RDMA/rxe: Add support for the traditional Atomic operations with ODP > > drivers/infiniband/sw/rxe/Makefile | 2 + > drivers/infiniband/sw/rxe/rxe.c | 27 ++- > drivers/infiniband/sw/rxe/rxe.h | 37 --- > drivers/infiniband/sw/rxe/rxe_comp.c | 12 +- > drivers/infiniband/sw/rxe/rxe_loc.h | 49 +++- > drivers/infiniband/sw/rxe/rxe_mr.c | 27 +-- > drivers/infiniband/sw/rxe/rxe_odp.c | 311 ++++++++++++++++++++++++++ > drivers/infiniband/sw/rxe/rxe_recv.c | 4 +- > drivers/infiniband/sw/rxe/rxe_resp.c | 32 ++- drivers/infiniband/sw/rxe/rxe_task.c | 84 ++++--- > drivers/infiniband/sw/rxe/rxe_task.h | 6 +- > drivers/infiniband/sw/rxe/rxe_verbs.c | 5 +- > drivers/infiniband/sw/rxe/rxe_verbs.h | 39 ++++ > 13 files changed, 535 insertions(+), 100 deletions(-) create mode 100644 drivers/infiniband/sw/rxe/rxe_odp.c > > base-commit: f605f26ea196a3b49bea249330cbd18dba61a33e > > -- > 2.39.1