RE: [RFC] IB/sa: Route SA pathrecord query through netlink

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Thu, 2015-05-21 at 13:52 +0000, Wan, Kaike wrote:
> > In our previous posting to the mailing list, we proposed to send a MAD
> > request from kernel (more specifically, from ib_sa module) to a user space
> application (ibacm in this case) through netlink.
> > The user space application will send back the response. This simple
> > scheme can achieve the goal of a local SA cache in user space.
> >
> > The format of the request and response is diagrammed below:
> >
> >   ------------------
> >   | netlink header |
> >   ------------------
> >   |     MAD        |
> >   ------------------
> >
> > The kernel requests for a pathrecord, and the user application finds
> > it in its local cache and sends it to the kernel. If the netlink
> > request fails, the kernel will send the request to SA through the normal IB
> path (ib_mad -> hca driver -> wire).
> >
> > Jason pointed out that this message format was limited to lower stack
> > format (MAD) and its use could not be readily extended to upper layer
> > modules like rdma_cm. After lengthy discussions, we come up with a new
> and modified scheme, as described below.
> >
> > The general format of the request and response will be the same:
> >
> >   ------------------
> >   | netlink header |
> >   ------------------
> >   |  Data header   |
> >   ------------------
> >   |      Data      |
> >   ------------------
> >
> > The data header contains information about the type of
> > request/response, the status (for response), the type (format) of the
> > data, the total length of the data header + data, and a flags field about the
> request/response or data.
> >
> > Based on the type of the data, the data section may be in different
> > format: a string about the host name to resolve, an IP4/IP6 address, a
> > pathrecord, a user pathrecord (struct ib_user_path_rec), or simply a
> > MAD (like our posted patches), etc. Essentially it can be of any
> > format based on the data type. The key is to document the format so that
> the kernel and user space can communicate correctly.
> >
> > The details are described below:
> >
> > #define IB_NL_VERSION		0x01
> >
> > #define IB_NL_OP_MASK		0x0F
> > #define IB_NL_OP_RESOLVE	0x01
> > #define IB_NL_OP_QUERY_PATH	0x02
> > #define IB_NL_OP_SET_TIMEOUT	0x03
> > #define IB_NL_OP_ACK		0x80
> 
> If OP_ACK is one bit, why isn't the OP_MASK 0x7f?

You are right. The mask should be 0x7f

> 
> > #define IB_NL_STATUS_SUCCESS	0x0000
> > #define IB_NL_STATUS_ENODATA	0x0001
> 
> Do we need 16 bits for a bool?  In fact, couldn't this actually be switched so
> that the return of the message uses OP_SUCCESS instead of OP_ACK?

Potentially, you may want to return different statii for diagnostic purpose. ( OP_ACK | original OP) indicates that this is a response to OP, just like what is done in MAD response.

> 
> In other words, instead of two items here, couldn't the ACK bit be dropped
> entirely and replaced with SUCCESS so that when the user app returns the
> netlink packet, if the op on return == to the op on send, it failed, if it's op |
> SUCCESS, it succeeded?
> 
> > #define IB_NL_DATA_TYPE_INVALID			0x0000
> > #define IB_NL_DATA_TYPE_NAME			0x0001
> > #define IB_NL_DATA_TYPE_ADDRESS_IP		0x0002
> > #define IB_NL_DATA_TYPE_ADDRESS_IP6		0x0003
> > #define IB_NL_DATA_TYPE_PATH_RECORD		0x0004
> > #define IB_NL_DATA_TYPE_USER_PATH_REC		0x0005
> > #define IB_NL_DATA_TYPE_MAD			0x0006
> >
> > #define IB_NL_FLAGS_PATH_GMP			1
> > #define IB_NL_FLAGS_PATH_PRIMARY		(1<<1)
> > #define IB_NL_FLAGS_PATH_ALTERNATE		(1<<2)
> > #define IB_NL_FLAGS_PATH_OUTBOUND		(1<<3)
> > #define IB_NL_FLAGS_PATH_INBOUND		(1<<4)
> > #define IB_NL_FLAGS_PATH_INBOUND_REVERSE 	(1<<5)
> > #define IB_NL_FLAGS_PATH_BIDIRECTIONAL
> 	(IB_PATH_OUTBOUND | IB_PATH_INBOUND_REVERSE)
> > #define IB_NL_FLAGS_QUERY_SA			(1<<31)
> > #define IB_NL_FLAGS_NODELAY			(1<<30)
> 
> Please keep these in numerical order, don't put <<31 and below it <<30

Indeed, the flags should be defined with care. I simply copied from include/uapi/rdma/ib_user_sa.h and acm.h from ibacm for demonstration. I will remove them from my patches later.

> 
> > struct ib_nl_data_hdr {
> > 	__u8	version;
> > 	__u8	opcode;
> > 	__u16	status;
> Drop status because we fold it into opcode

Only if we don't need status. But we may need more status.


> > 	__u16	type;
> > 	__u16	reserved;
> Drop reserved because we don't need alignment any more

Depends on above.

> > 	__u32	flags;
> Flags is the only thing using bits fast, and we would want to make this header
> an even 128bits in length, so add a __u32 reserved; here.  That's more likely
> to be useful than the current layout since we are likely to run out of flags
> before anything else.

That is a very reasonable assumption. I will keep an eye on it.

> > 	__u32	length;
> > };
> >
> > struct ib_nl_data {
> > 	struct ib_nl_data_hdr		hdr;
> > 	__u8				data[0];
> > };
> >
> >
> > These defines and structures can be added to file
> > include/upai/rdma/rdma_netlink.h (replace with RDMA_NL prefix) or
> contained in a seperate file (include/upai/rdma/ib_netlink.h ???).
> >
> > Please share your thoughts.
> 
> I think an extensible netlink framework here is the right way to go, certainly
> better than the one shot method you had first.

Thank you.

Kaike
> 
> > Kaike
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> > in the body of a message to majordomo@xxxxxxxxxxxxxxx More
> majordomo
> > info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> --
> Doug Ledford <dledford@xxxxxxxxxx>
>               GPG KeyID: 0E572FDD

��.n��������+%������w��{.n�����{���fk��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f





[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux