On Wed, Mar 19, 2025 at 02:21:23PM -0400, Jamal Hadi Salim wrote: > Curious how you guarantee that a "destroy" will not fail under OOM. Do > you have pre-allocated memory? It just never allocates memory? Why would a simple system call like a destruction allocate any memory? > > Overall systems calls here should either succeed or fail and be the > > same as a NOP. No failure that actually did something and then creates > > some resource leak or something because userspace didn't know about > > it. > > Yes, this is how netlink works as well. If a failure to delete an > object occurs then every transient state gets restored. This is always > the case for simple requests(a delete/create/update). For requests > that batch multiple objects there are cases where there is no > unwinding. I'm not sure that is complely true, like if userspace messes up the netlink read() side of the API and copy_to_user() fails then you can get these inconsistencies. In the RDMA model even those edge case are properly unwound, just like a normal system call would. > Makes sense. So ioctls with TLVs ;-> > I am suspecting you don't have concepts of TLVs inside TLVs for > hierarchies within objects. No, it has not been needed yet, or at least the cases that have come up have been happy to use arrays of structs for the nesting. The method calls themselves don't tend to have that kind of challenging structure for their arguments. > > RDMA also has special infrastructure to split up the TLV space between > > core code and HW driver code which is a key feature and necessary part > > of how you'd build a user/kernel split driver. > > The T namespace is split between core code and driver code? > I can see that as being useful for debugging maybe? What else? RDMA is all about having a user/kernel driver co-design. This means a driver has code in a userspace library and code in the kernel that work together to implement the functionality. The userspace library should be thought of as an extension of the kernel driver into userspace. So, there is alot of traffic between the two driver components that is just private and unique to the driver. This is what the driver namespace is used for. For instance there is a common method call to create a queue. The queue has a number of core parameters like depth, and address, then it calls the driver and there are bunch of device specific parameters too, like say queue entry format. Every driver gets to define its own parameters best suited to its own device and its own user/kernel split. Building a split user/kernel driver is complicated and uAPI is one of the biggest challenges :\ > > > - And as Nik mentioned: The new (yaml)model-to-generatedcode approach > > > that is now common in generic netlink highly reduces developer effort. > > > Although in my opinion we really need this stuff integrated into tools > > > like iproute2.. > > > > RDMA also has a DSL like scheme for defining schema, and centralized > > parsing and validation. IMHO it's capability falls someplace between > > the old netlink policy stuff and the new YAML stuff. > > > > I meant the ability to start with a data model and generate code as > being useful. > Where can i find the RDMA DSL? It is done with the C preprocessor instead of an external YAML file. Look at drivers/infiniband/core/uverbs_std_types_mr.c at the end. It describes a data model, but it is elaborated at runtime into an efficient parse tree, not by using a code generator. The schema is more classical object oriented RPC type scheme where you define objects, methods and then method parameters. The objects have an entire kernel side infrastructure to manage their lifecycle and the attributes have validation and parsing done prior to reaching the C function implementing the method. I always thought it was netlink inspired, but more suited to building a uAPI out of. Like you get actual system call names (eg UVERBS_METHOD_REG_DMABUF_MR) that have actual C functions implementing them. There is special help to implement object allocation and destruction functions, and freedom to have as many methods per object as make sense. > I dont know enough about RDMA infra to comment but iiuc, you are > saying that it is the control infrastructure (that sits in > userspace?), that does all those things you mention, that is more > important. There is an entire object model in the kernel and it is linked into the schema. For instance in the above example we have a schema for an object method like this: DECLARE_UVERBS_NAMED_METHOD( UVERBS_METHOD_REG_DMABUF_MR, UVERBS_ATTR_IDR(UVERBS_ATTR_REG_DMABUF_MR_HANDLE, UVERBS_OBJECT_MR, UVERBS_ACCESS_NEW, UA_MANDATORY), UVERBS_ATTR_IDR(UVERBS_ATTR_REG_DMABUF_MR_PD_HANDLE, UVERBS_OBJECT_PD, UVERBS_ACCESS_READ, UA_MANDATORY), That says it accepts two object handles MR and PD as input to the method call. The core code keeps track of all these object handles, validates the ID number given by userspace is refering to the correct object, of the correct type, in the correct state. Locks things against concurrent destruction, and then gives a trivial way for the C method implementation to pick up the object pointer: struct ib_pd *pd = uverbs_attr_get_obj(attrs, UVERBS_ATTR_REG_DMABUF_MR_PD_HANDLE); Which can't fail because everything was already checked before we get here. This is all designed to greatly simplify and make robust the method implementations that are often in driver code. Jason