Re: [PATCH RFC 1/9] RDMA/rv: Public interferce for the RDMA Rendezvous module

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 19, 2021 at 08:56:27AM -0400, kaike.wan@xxxxxxxxx wrote:
> From: Kaike Wan <kaike.wan@xxxxxxxxx>
> 
> The RDMA Rendezvous (rv) module provides an interface for HPC
> middlewares to improve performance by caching memory region
> registration, and improve the scalibity of RDMA transaction
> through connection managements between nodes. This mechanism
> is implemented through the following ioctl requests:
> - ATTACH: to attach to an RDMA device.
> - REG_MEM: to register a user/kernel memory region.
> - DEREG_MEM: to release application use of MR, allowing it to
>              remain in cache.
> - GET_CACHE_STATS: to get cache statistics.
> - CONN_CREATE: to create an RC connection.
> - CONN_CONNECT: to start the connection.
> - CONN_GET_CONN_COUNT: to use as part of error recovery from lost
>                        messages in the application.
> - CONN_GET_STATS: to get connection statistics.
> - GET_EVENT_STATS: to get the RDMA event statistics.
> - POST_RDMA_WR_IMMED: to post an RDMA WRITE WITH IMMED request.
> 
> Signed-off-by: Todd Rimmer <todd.rimmer@xxxxxxxxx>
> Signed-off-by: Kaike Wan <kaike.wan@xxxxxxxxx>
>  include/uapi/rdma/rv_user_ioctls.h | 558 +++++++++++++++++++++++++++++
>  1 file changed, 558 insertions(+)
>  create mode 100644 include/uapi/rdma/rv_user_ioctls.h
> 
> diff --git a/include/uapi/rdma/rv_user_ioctls.h b/include/uapi/rdma/rv_user_ioctls.h
> new file mode 100644
> index 000000000000..97e35b722443
> +++ b/include/uapi/rdma/rv_user_ioctls.h
> @@ -0,0 +1,558 @@
> +/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */
> +/*
> + * Copyright(c) 2020 - 2021 Intel Corporation.
> + */
> +#ifndef __RV_USER_IOCTL_H__
> +#define __RV_USER_IOCTL_H__
> +#include <rdma/rdma_user_ioctl.h>
> +#include <rdma/ib_user_sa.h>
> +#include <rdma/ib_user_verbs.h>
> +
> +/* Checking /Documentation/userspace-api/ioctl/ioctl-number.rst */
> +#define RV_MAGIC RDMA_IOCTL_MAGIC

No.

> +#define RV_FILE_NAME "/dev/rv"

No.

> +
> +/*
> + * Handles are opaque to application; they are meaningful only to the
> + * RV driver
> + */
> +
> +/* this version of ABI */
> +#define RV_ABI_VER_MAJOR 1
> +#define RV_ABI_VER_MINOR 0

No, see my remarks to your other intel colleagues doing the ioasid
stuff.

> +struct rv_query_params_out {
> +		/* ABI version */
> +	__u16 major_rev;
> +	__u16 minor_rev;
> +	__u32 resv1;
> +	__aligned_u64 capability;
> +	__aligned_u64 resv2[6];

No pre-adding reserved stuff

> +};
> +
> +#define RV_IOCTL_QUERY _IOR(RV_MAGIC, 0xFC, struct rv_query_params_out)
> +
> +/* Mode for use of rv module by PSM */
> +#define RV_RDMA_MODE_USER 0	/* user MR caching only */
> +#define RV_RDMA_MODE_KERNEL 1	/* + kernel RC QPs with kernel MR caching */

Huh? Mode sounds bad.

> +/*
> + * mr_cache_size is in MBs and if 0 will use module param as default
> + * num_conn - number of QPs between each pair of nodes
> + * loc_addr - used to select client/listen vs rem_addr
> + * index_bits - num high bits of immed data with rv index
> + * loc_gid_index - SGID for client connections
> + * loc_gid[16] - to double check gid_index unchanged
> + * job_key[RV_MAX_JOB_KEY_LEN] = unique uuid per job
> + * job_key_len - len, if 0 matches jobs with len==0 only
> + * q_depth - size of QP and per QP CQs
> + * reconnect_timeout - in seconds from loss to restoration
> + * hb_interval - in milliseconds between heartbeats
> + */
> +struct rv_attach_params_in {
> +	char dev_name[RV_MAX_DEV_NAME_LEN];

Guessing this is a no too.

> +	__u32 mr_cache_size;
> +	__u8 rdma_mode;
> +
> +	/* additional information for RV_RDMA_MODE_KERNEL */
> +	__u8 port_num;
> +	__u8 num_conn;

Lots of alignment holes, don't do that either.

> +struct rv_attach_params {
> +	union {
> +		struct rv_attach_params_in in;
> +		struct rv_attach_params_out out;
> +	};
> +};

Yikes, no

> +/* The buffer is used to register a kernel mr */
> +#define IBV_ACCESS_KERNEL 0x80000000

Huh? WTF on on many levels

> +/*
> + * ibv_pd_handle - user space appl allocated pd
> + * ulen - driver_udata inlen
> + * *udata - driver_updata inbuf
> + */
> +struct rv_mem_params_in {
> +	__u32 ibv_pd_handle;
> +	__u32 cmd_fd_int;

Really? You want to reach into the command FD from a ULP and extract
objects? Double yikes. Did you do this properly, taking care of every
lifetime rule and handling disassociation?

> +	__aligned_u64 addr;
> +	__aligned_u64 length;
> +	__u32 access;
> +	size_t ulen;
> +	void *udata;

'void *' is wrong for ioctls.

> +struct rv_conn_get_stats_params_out {

Too many stats, don't you think? Most of the header seems to be stats
of one thing or another.

> +/*
> + * events placed on ring buffer for delivery to user space.
> + * Carefully sized to be a multiple of 64 bytes for cache alignment.
> + * Must pack to get good field alignment and desired 64B overall size
> + * Unlike verbs, all rv_event fields are defined even when
> + * rv_event.wc.status != IB_WC_SUCCESS. Only sent writes can report bad status.
> + * event_type - enum rv_event_type
> + * wc - send or recv work completions
> + *	status - ib_wc_status
> + *	resv1 - alignment
> + *	imm_data - for RV_WC_RECV_RDMA_WITH_IMM only
> + *	wr_id - PSM wr_id for RV_WC_RDMA_WRITE only
> + *	conn_handle - conn handle. For efficiency in completion processing, this
> + *		handle is the rv_conn handle, not the rv_user_conn.
> + *		Main use is sanity checks.  On Recv PSM must use imm_data to
> + *		efficiently identify source.
> + *	byte_len - unlike verbs API, this is always valid
> + *	resv2 - alignment
> + * cache_align -  not used, but forces overall struct to 64B size
> + */
> +struct rv_event {
> +	__u8		event_type;
> +	union {
> +		struct {
> +			__u8		status;
> +			__u16	resv1;
> +			__u32	imm_data;
> +			__aligned_u64	wr_id;
> +			__aligned_u64	conn_handle;
> +			__u32	byte_len;
> +			__u32	resv2;
> +		} __attribute__((__packed__)) wc;
> +		struct {
> +			__u8 pad[7];
> +			uint64_t pad2[7];
> +		} __attribute__((__packed__)) cache_align;
> +	};
> +} __attribute__((__packed__));

Uhh, mixing packed and aligned_u64 is pretty silly. I don't think this
needs to be packed or written in this tortured way.

> +
> +/*
> + * head - consumer removes here
> + * tail - producer adds here
> + * overflow_cnt - number of times producer overflowed ring and discarded
> + * pad - 64B cache alignment for entries
> + */
> +struct rv_ring_header {
> +	volatile __u32 head;
> +	volatile __u32 tail;
> +	volatile __u64 overflow_cnt;

No on volatile, and missed a __aligned here

This uapi needs to be much better. It looks like the mess from the PSM
char dev just re-imported here.

At the very least split the caching from the other operations and
follow normal ioctl design

And you need to rethink the uverbs stuff.

Jason



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux