> On Thu, 2015-05-21 at 13:52 +0000, Wan, Kaike wrote: > > In our previous posting to the mailing list, we proposed to send a MAD > > request from kernel (more specifically, from ib_sa module) to a user space > application (ibacm in this case) through netlink. > > The user space application will send back the response. This simple > > scheme can achieve the goal of a local SA cache in user space. > > > > The format of the request and response is diagrammed below: > > > > ------------------ > > | netlink header | > > ------------------ > > | MAD | > > ------------------ > > > > The kernel requests for a pathrecord, and the user application finds > > it in its local cache and sends it to the kernel. If the netlink > > request fails, the kernel will send the request to SA through the normal IB > path (ib_mad -> hca driver -> wire). > > > > Jason pointed out that this message format was limited to lower stack > > format (MAD) and its use could not be readily extended to upper layer > > modules like rdma_cm. After lengthy discussions, we come up with a new > and modified scheme, as described below. > > > > The general format of the request and response will be the same: > > > > ------------------ > > | netlink header | > > ------------------ > > | Data header | > > ------------------ > > | Data | > > ------------------ > > > > The data header contains information about the type of > > request/response, the status (for response), the type (format) of the > > data, the total length of the data header + data, and a flags field about the > request/response or data. > > > > Based on the type of the data, the data section may be in different > > format: a string about the host name to resolve, an IP4/IP6 address, a > > pathrecord, a user pathrecord (struct ib_user_path_rec), or simply a > > MAD (like our posted patches), etc. Essentially it can be of any > > format based on the data type. The key is to document the format so that > the kernel and user space can communicate correctly. > > > > The details are described below: > > > > #define IB_NL_VERSION 0x01 > > > > #define IB_NL_OP_MASK 0x0F > > #define IB_NL_OP_RESOLVE 0x01 > > #define IB_NL_OP_QUERY_PATH 0x02 > > #define IB_NL_OP_SET_TIMEOUT 0x03 > > #define IB_NL_OP_ACK 0x80 > > If OP_ACK is one bit, why isn't the OP_MASK 0x7f? You are right. The mask should be 0x7f > > > #define IB_NL_STATUS_SUCCESS 0x0000 > > #define IB_NL_STATUS_ENODATA 0x0001 > > Do we need 16 bits for a bool? In fact, couldn't this actually be switched so > that the return of the message uses OP_SUCCESS instead of OP_ACK? Potentially, you may want to return different statii for diagnostic purpose. ( OP_ACK | original OP) indicates that this is a response to OP, just like what is done in MAD response. > > In other words, instead of two items here, couldn't the ACK bit be dropped > entirely and replaced with SUCCESS so that when the user app returns the > netlink packet, if the op on return == to the op on send, it failed, if it's op | > SUCCESS, it succeeded? > > > #define IB_NL_DATA_TYPE_INVALID 0x0000 > > #define IB_NL_DATA_TYPE_NAME 0x0001 > > #define IB_NL_DATA_TYPE_ADDRESS_IP 0x0002 > > #define IB_NL_DATA_TYPE_ADDRESS_IP6 0x0003 > > #define IB_NL_DATA_TYPE_PATH_RECORD 0x0004 > > #define IB_NL_DATA_TYPE_USER_PATH_REC 0x0005 > > #define IB_NL_DATA_TYPE_MAD 0x0006 > > > > #define IB_NL_FLAGS_PATH_GMP 1 > > #define IB_NL_FLAGS_PATH_PRIMARY (1<<1) > > #define IB_NL_FLAGS_PATH_ALTERNATE (1<<2) > > #define IB_NL_FLAGS_PATH_OUTBOUND (1<<3) > > #define IB_NL_FLAGS_PATH_INBOUND (1<<4) > > #define IB_NL_FLAGS_PATH_INBOUND_REVERSE (1<<5) > > #define IB_NL_FLAGS_PATH_BIDIRECTIONAL > (IB_PATH_OUTBOUND | IB_PATH_INBOUND_REVERSE) > > #define IB_NL_FLAGS_QUERY_SA (1<<31) > > #define IB_NL_FLAGS_NODELAY (1<<30) > > Please keep these in numerical order, don't put <<31 and below it <<30 Indeed, the flags should be defined with care. I simply copied from include/uapi/rdma/ib_user_sa.h and acm.h from ibacm for demonstration. I will remove them from my patches later. > > > struct ib_nl_data_hdr { > > __u8 version; > > __u8 opcode; > > __u16 status; > Drop status because we fold it into opcode Only if we don't need status. But we may need more status. > > __u16 type; > > __u16 reserved; > Drop reserved because we don't need alignment any more Depends on above. > > __u32 flags; > Flags is the only thing using bits fast, and we would want to make this header > an even 128bits in length, so add a __u32 reserved; here. That's more likely > to be useful than the current layout since we are likely to run out of flags > before anything else. That is a very reasonable assumption. I will keep an eye on it. > > __u32 length; > > }; > > > > struct ib_nl_data { > > struct ib_nl_data_hdr hdr; > > __u8 data[0]; > > }; > > > > > > These defines and structures can be added to file > > include/upai/rdma/rdma_netlink.h (replace with RDMA_NL prefix) or > contained in a seperate file (include/upai/rdma/ib_netlink.h ???). > > > > Please share your thoughts. > > I think an extensible netlink framework here is the right way to go, certainly > better than the one shot method you had first. Thank you. Kaike > > > Kaike > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" > > in the body of a message to majordomo@xxxxxxxxxxxxxxx More > majordomo > > info at http://vger.kernel.org/majordomo-info.html > > > -- > Doug Ledford <dledford@xxxxxxxxxx> > GPG KeyID: 0E572FDD ��.n��������+%������w��{.n�����{���fk��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f