Re: Netlink vs ioctl WAS(Re: [RFC PATCH 00/13] Ultra Ethernet driver introduction

Jason Gunthorpe <jgg@xxxxxxxxxx> · Tue, 18 Mar 2025 19:49:12 -0300

On Sat, Mar 15, 2025 at 04:49:20PM -0400, Jamal Hadi Salim wrote:

> On "unreliable": This is typically a result of some request response
> (or a subscribed to event) whose execution has failed to allocate
> memory in the kernel or overrun some buffers towards user space;
> however, any such failures are signalled to user space and can be
> recovered from.

No, they can't be recovered from in all cases. Randomly failing system
calls because of memory pressure is a horrible foundation to build
what something like RDMA needs. It is not acceptable that something
like a destroy system call would just randomly fail because the kernel
is OOMing. There is no recovery from this beyond leaking memory - the
opposite of what you want in an OOM situation.

> ioctl is synchronous which gives it the "reliability" and "speed".
> iirc, if memory failure was to happen on ioctl it will block until it
> is successful? 

It would fail back to userspace and unwind whatever it did.

The unwinding is tricky and RDMA's infrastructure has alot of support
to make it easier for driver writers to get this right in all the
different error cases.

Overall systems calls here should either succeed or fail and be the
same as a NOP. No failure that actually did something and then creates
some resource leak or something because userspace didn't know about
it.

> Extensibility: ioctl take binary structs which make it much harder to
> extend but adds to that "speed". Once you pick your struct, you are
> stuck with it - as opposed to netlink which uses very extensible
> formally defined TLVs that makes it highly extensible. 

RDMA uses TLVs now too. It has one of the largest uAPI surfaces in the
kernel, TLVs were introduced for the same reason netlink uses them.

RDMA also has special infrastructure to split up the TLV space between
core code and HW driver code which is a key feature and necessary part
of how you'd build a user/kernel split driver.

> - And as Nik mentioned: The new (yaml)model-to-generatedcode approach
> that is now common in generic netlink highly reduces developer effort.
> Although in my opinion we really need this stuff integrated into tools
> like iproute2..

RDMA also has a DSL like scheme for defining schema, and centralized
parsing and validation. IMHO it's capability falls someplace between
the old netlink policy stuff and the new YAML stuff.

But just focusing on schema and TLVs really undersells all the
specialized infrastructure that exists for managing objects, security,
HW pass through and other infrastructure things unique to RDMA.

Jason