On Thu, 2015-05-21 at 13:21 -0400, Doug Ledford wrote: > On Thu, 2015-05-21 at 13:52 +0000, Wan, Kaike wrote: > > In our previous posting to the mailing list, we proposed to send a MAD request from kernel (more > > specifically, from ib_sa module) to a user space application (ibacm in this case) through netlink. > > The user space application will send back the response. This simple scheme can achieve the goal > > of a local SA cache in user space. > > > > The format of the request and response is diagrammed below: > > > > ------------------ > > | netlink header | > > ------------------ > > | MAD | > > ------------------ > > > > The kernel requests for a pathrecord, and the user application finds it in its local cache and sends > > it to the kernel. If the netlink request fails, the kernel will send the request to SA through the > > normal IB path (ib_mad -> hca driver -> wire). > > > > Jason pointed out that this message format was limited to lower stack format (MAD) and its use > > could not be readily extended to upper layer modules like rdma_cm. After lengthy discussions, we > > come up with a new and modified scheme, as described below. > > > > The general format of the request and response will be the same: > > > > ------------------ > > | netlink header | > > ------------------ > > | Data header | > > ------------------ > > | Data | > > ------------------ > > > > The data header contains information about the type of request/response, the status (for response), > > the type (format) of the data, the total length of the data header + data, and a flags field about > > the request/response or data. > > > > Based on the type of the data, the data section may be in different format: a string about the host > > name to resolve, an IP4/IP6 address, a pathrecord, a user pathrecord (struct ib_user_path_rec), > > or simply a MAD (like our posted patches), etc. Essentially it can be of any format based on the > > data type. The key is to document the format so that the kernel and user space can communicate > > correctly. > > > > The details are described below: > > > > #define IB_NL_VERSION 0x01 > > > > #define IB_NL_OP_MASK 0x0F > > #define IB_NL_OP_RESOLVE 0x01 > > #define IB_NL_OP_QUERY_PATH 0x02 > > #define IB_NL_OP_SET_TIMEOUT 0x03 > > #define IB_NL_OP_ACK 0x80 > > If OP_ACK is one bit, why isn't the OP_MASK 0x7f? > > > #define IB_NL_STATUS_SUCCESS 0x0000 > > #define IB_NL_STATUS_ENODATA 0x0001 > > Do we need 16 bits for a bool? In fact, couldn't this actually be > switched so that the return of the message uses OP_SUCCESS instead of > OP_ACK? > > In other words, instead of two items here, couldn't the ACK bit be > dropped entirely and replaced with SUCCESS so that when the user app > returns the netlink packet, if the op on return == to the op on send, it > failed, if it's op | SUCCESS, it succeeded? > > > #define IB_NL_DATA_TYPE_INVALID 0x0000 > > #define IB_NL_DATA_TYPE_NAME 0x0001 > > #define IB_NL_DATA_TYPE_ADDRESS_IP 0x0002 > > #define IB_NL_DATA_TYPE_ADDRESS_IP6 0x0003 > > #define IB_NL_DATA_TYPE_PATH_RECORD 0x0004 > > #define IB_NL_DATA_TYPE_USER_PATH_REC 0x0005 > > #define IB_NL_DATA_TYPE_MAD 0x0006 > > > > #define IB_NL_FLAGS_PATH_GMP 1 > > #define IB_NL_FLAGS_PATH_PRIMARY (1<<1) > > #define IB_NL_FLAGS_PATH_ALTERNATE (1<<2) > > #define IB_NL_FLAGS_PATH_OUTBOUND (1<<3) > > #define IB_NL_FLAGS_PATH_INBOUND (1<<4) > > #define IB_NL_FLAGS_PATH_INBOUND_REVERSE (1<<5) > > #define IB_NL_FLAGS_PATH_BIDIRECTIONAL (IB_PATH_OUTBOUND | IB_PATH_INBOUND_REVERSE) > > #define IB_NL_FLAGS_QUERY_SA (1<<31) > > #define IB_NL_FLAGS_NODELAY (1<<30) > > Please keep these in numerical order, don't put <<31 and below it <<30 > > > struct ib_nl_data_hdr { > > __u8 version; > > __u8 opcode; > > __u16 status; > Drop status because we fold it into opcode > > __u16 type; > > __u16 reserved; > Drop reserved because we don't need alignment any more > > __u32 flags; > Flags is the only thing using bits fast, and we would want to make this > header an even 128bits in length, so add a __u32 reserved; here. That's > more likely to be useful than the current layout since we are likely to > run out of flags before anything else. > > __u32 length; > > }; > > > > struct ib_nl_data { > > struct ib_nl_data_hdr hdr; > > __u8 data[0]; > > }; > > > > > > These defines and structures can be added to file include/upai/rdma/rdma_netlink.h (replace with > > RDMA_NL prefix) or contained in a seperate file (include/upai/rdma/ib_netlink.h ???). > > > > Please share your thoughts. > > I think an extensible netlink framework here is the right way to go, > certainly better than the one shot method you had first. The one thing I left out of the above that might be worth changing is the fact that you bury your sequence number down in your mad header. If there is a generic mechanism that multiple modules can use to send customized data via nl, then it might be worthwhile to have the sequence moved to the generic level. -- Doug Ledford <dledford@xxxxxxxxxx> GPG KeyID: 0E572FDD
Attachment:
signature.asc
Description: This is a digitally signed message part