Re: [RFC PATCH v2 0/9] RDMA/rxe: Add RDMA FLUSH operation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 25, 2022 at 4:45 PM Li Zhijian <lizhijian@xxxxxxxxxxxxxx> wrote:
>
> Hey folks,
>
> I wanna thank all of you for the kind feedback in my previous RFC.
> Recently, i have tried my best to do some updates as per your comments.
> Indeed, not all comments have been addressed for some reasons, i still
> wish to post this new one to start a new discussion.
>
> Outstanding issues:
> - iova_to_addr() without any kmap/kmap_local_page flows might not always
>   work. # existing issue.
> - responder should reply error to requested side when it requests a
>   persistence placement type to DRAM ?
> -------
>
> These patches are going to implement a *NEW* RDMA opcode "RDMA FLUSH".
> In IB SPEC 1.5[1][2], 2 new opcodes, ATOMIC WRITE and RDMA FLUSH were
> added in the MEMORY PLACEMENT EXTENSIONS section.
>
> FLUSH is used by the requesting node to achieve guarantees on the data
> placement within the memory subsystem of preceding accesses to a
> single memory region, such as those performed by RDMA WRITE, Atomics
> and ATOMIC WRITE requests.
>
> The operation indicates the virtual address space of a destination node
> and where the guarantees should apply. This range must be contiguous
> in the virtual space of the memory key but it is not necessarily a
> contiguous range of physical memory.
>
> FLUSH packets carry FLUSH extended transport header (see below) to
> specify the placement type and the selectivity level of the operation
> and RDMA extended header (RETH, see base document RETH definition) to
> specify the R_Key VA and Length associated with this request following
> the BTH in RC, RDETH in RD and XRCETH in XRC.

Thanks. Would you like to add some test cases in the latest rdma-core
about this RDMA FLUSH operation?

Thanks a lot.
Zhu Yanjun

>
> RC FLUSH:
> +----+------+------+
> |BTH | FETH | RETH |
> +----+------+------+
>
> RD FLUSH:
> +----+------+------+------+
> |BTH | RDETH| FETH | RETH |
> +----+------+------+------+
>
> XRC FLUSH:
> +----+-------+------+------+
> |BTH | XRCETH| FETH | RETH |
> +----+-------+------+------+
>
> Currently, we introduce RC and RD services only, since XRC has not been
> implemented by rxe yet.
> NOTE: only RC service is tested now, and since other HCAs have not
> added/implemented FLUSH yet, we can only test FLUSH operation in both
> SoftRoCE/rxe devices.
>
> The corresponding rdma-core and FLUSH example are available on:
> https://github.com/zhijianli88/rdma-core/tree/rfc
> Can access the kernel source in:
> https://github.com/zhijianli88/linux/tree/rdma-flush
>
> - We introduce is_pmem attribute to MR(memory region)
> - We introduce FLUSH placement type attributes to HCA
> - We introduce FLUSH access flags that users are able to register with
> Below figure shows the valid access flags uses can register with:
> +------------------------+------------------+--------------+
> | HCA attributes         |    register access flags        |
> |        and             +-----------------+---------------+
> | MR attribute(is_pmem)  |global visibility |  persistence |
> |------------------------+------------------+--------------+
> | global visibility(DRAM)|        O         |      X       |
> |------------------------+------------------+--------------+
> | global visibility(PMEM)|        O         |      X       |
> |------------------------+------------------+--------------+
> | persistence(DRAM)      |        X         |      X       |
> |------------------------+------------------+--------------+
> | persistence(PMEM)      |        X         |      O       |
> +------------------------+------------------+--------------+
> O: allow to register such access flag
>
> In order to make placement guarentees, we currently reject requesting a
> persistent flush to a non-pmem.
> The responder will check the remote requested placement types by checking
> the registered access flags.
> +------------------------+------------------+--------------+
> |                        |     registered flags            |
> | remote requested types +------------------+--------------+
> |                        |global visibility |  persistence |
> |------------------------+------------------+--------------+
> | global visibility      |        O         |      x       |
> +------------------------+------------------+--------------+
> | persistence            |        X         |      O       |
> +------------------------+------------------+--------------+
> O: allow to request such placement type
>
> Below list some details about FLUSH transport packet:
>
> A FLUSH message is built upon FLUSH request packet and is responded
> successfully by RDMA READ response of zero size.
>
> oA19-2: FLUSH shall be single packet message and shall have no payload.
> oA19-5: FLUSH BTH shall hold the Opcode = 0x1C
>
> FLUSH Extended Transport Header(FETH)
> +-----+-----------+------------------------+----------------------+
> |Bits |   31-6    |          5-4           |        3-0           |
> +-----+-----------+------------------------+----------------------+
> |     | Reserved  | Selectivity Level(SEL) | Placement Type(PLT)  |
> +-----+-----------+------------------------+----------------------+
>
> Selectivity Level (SEL) – defines the memory region scope the FLUSH
> should apply on. Values are as follows:
> • b’00 - Memory Region Range: FLUSH applies for all preceding memory
>          updates to the RETH range on this QP. All RETH fields shall be
>          valid in this selectivity mode. RETH:DMALen field shall be
>          between zero and (2 31 -1) bytes (inclusive).
> • b’01 - Memory Region: FLUSH applies for all preceding memory up-
>          dates to RETH.R_key on this QP. RETH:DMALen and RETH:VA
>          shall be ignored in this mode.
> • b'10 - Reserved.
> • b'11 - Reserved.
>
> Placement Type (PLT) – Defines the memory placement guarantee of
> this FLUSH. Multiple bits may be set in this field. Values are as follows:
> • Bit 0 if set to '1' indicated that the FLUSH should guarantee Global
>   Visibility.
> • Bit 1 if set to '1' indicated that the FLUSH should guarantee
>   Persistence.
> • Bits 3:2 are reserved
>
> [1]: https://www.infinibandta.org/ibta-specification/ # login required
> [2]: https://www.infinibandta.org/wp-content/uploads/2021/08/IBTA-Overview-of-IBTA-Volume-1-Release-1.5-and-MPE-2021-08-17-Secure.pptx
>
> CC: yangx.jy@xxxxxxxxxxxxxx
> CC: y-goto@xxxxxxxxxxx
> CC: Jason Gunthorpe <jgg@xxxxxxxx>
> CC: Zhu Yanjun <zyjzyj2000@xxxxxxxxx
> CC: Leon Romanovsky <leon@xxxxxxxxxx>
> CC: Bob Pearson <rpearsonhpe@xxxxxxxxx>
> CC: Mark Bloch <mbloch@xxxxxxxxxx>
> CC: Wenpeng Liang <liangwenpeng@xxxxxxxxxx>
> CC: Aharon Landau <aharonl@xxxxxxxxxx>
> CC: Tom Talpey <tom@xxxxxxxxxx>
> CC: "Gromadzki, Tomasz" <tomasz.gromadzki@xxxxxxxxx>
> CC: Dan Williams <dan.j.williams@xxxxxxxxx>
> CC: linux-rdma@xxxxxxxxxxxxxxx
> CC: linux-kernel@xxxxxxxxxxxxxxx
>
> V1:
> https://lore.kernel.org/lkml/050c3183-2fc6-03a1-eecd-258744750972@xxxxxxxxxxx/T/
> or https://github.com/zhijianli88/linux/tree/rdma-flush-rfcv1
>
> Changes log
> V2:
> https://github.com/zhijianli88/linux/tree/rdma-flush
> RDMA: mr: Introduce is_pmem
>   check 1st byte to avoid crossing page boundary
>   new scheme to check is_pmem # Dan
>
> RDMA: Allow registering MR with flush access flags
>   combine with [03/10] RDMA/rxe: Allow registering FLUSH flags for supported device only to this patch # Jason
>   split RDMA_FLUSH to 2 capabilities
>
> RDMA/rxe: Allow registering persistent flag for pmem MR only
>   update commit message, get rid of confusing ib_check_flush_access_flags() # Tom
>
> RDMA/rxe: Implement RC RDMA FLUSH service in requester side
>   extend flush to include length field. # Tom and Tomasz
>
> RDMA/rxe: Implement flush execution in responder side
>   adjust start for WHOLE MR level # Tom
>   don't support DMA mr for flush # Tom
>   check flush return value
>
> RDMA/rxe: Enable RDMA FLUSH capability for rxe device
>   adjust patch's order. move it here from [04/10]
>
> Li Zhijian (9):
>   RDMA: mr: Introduce is_pmem
>   RDMA: Allow registering MR with flush access flags
>   RDMA/rxe: Allow registering persistent flag for pmem MR only
>   RDMA/rxe: Implement RC RDMA FLUSH service in requester side
>   RDMA/rxe: Set BTH's SE to zero for FLUSH packet
>   RDMA/rxe: Implement flush execution in responder side
>   RDMA/rxe: Implement flush completion
>   RDMA/rxe: Enable RDMA FLUSH capability for rxe device
>   RDMA/rxe: Add RD FLUSH service support
>
>  drivers/infiniband/core/uverbs_cmd.c    |  17 +++
>  drivers/infiniband/sw/rxe/rxe_comp.c    |   4 +-
>  drivers/infiniband/sw/rxe/rxe_hdr.h     |  52 +++++++++
>  drivers/infiniband/sw/rxe/rxe_loc.h     |   2 +
>  drivers/infiniband/sw/rxe/rxe_mr.c      |  37 ++++++-
>  drivers/infiniband/sw/rxe/rxe_opcode.c  |  35 +++++++
>  drivers/infiniband/sw/rxe/rxe_opcode.h  |   3 +
>  drivers/infiniband/sw/rxe/rxe_param.h   |   4 +-
>  drivers/infiniband/sw/rxe/rxe_req.c     |  19 +++-
>  drivers/infiniband/sw/rxe/rxe_resp.c    | 133 +++++++++++++++++++++++-
>  include/rdma/ib_pack.h                  |   3 +
>  include/rdma/ib_verbs.h                 |  30 +++++-
>  include/uapi/rdma/ib_user_ioctl_verbs.h |   2 +
>  include/uapi/rdma/ib_user_verbs.h       |  19 ++++
>  include/uapi/rdma/rdma_user_rxe.h       |   7 ++
>  15 files changed, 355 insertions(+), 12 deletions(-)
>
> --
> 2.31.1
>
>
>




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux