Re: Netlink vs ioctl WAS(Re: [RFC PATCH 00/13] Ultra Ethernet driver introduction

Leon Romanovsky <leon@xxxxxxxxxx> · Mon, 17 Mar 2025 14:57:51 +0200

On Sat, Mar 15, 2025 at 04:49:20PM -0400, Jamal Hadi Salim wrote:
> On Wed, Mar 12, 2025 at 11:11 AM Leon Romanovsky <leon@xxxxxxxxxx> wrote:
> >
> > On Wed, Mar 12, 2025 at 04:20:08PM +0200, Nikolay Aleksandrov wrote:
> > > On 3/12/25 1:29 PM, Leon Romanovsky wrote:
> > > > On Wed, Mar 12, 2025 at 11:40:05AM +0200, Nikolay Aleksandrov wrote:
> > > >> On 3/8/25 8:46 PM, Leon Romanovsky wrote:
> > > >>> On Fri, Mar 07, 2025 at 01:01:50AM +0200, Nikolay Aleksandrov wrote:
> > > [snip]
> > > >> Also we have the ephemeral PDC connections>> that come and go as
> > > needed. There more such objects coming with more
> > > >> state, configuration and lifecycle management. That is why we added a
> > > >> separate netlink family to cleanly manage them without trying to fit
> > > >> a square peg in a round hole so to speak.
> > > >
> > > > Yeah, I saw that you are planning to use netlink to manage objects,
> > > > which is very questionable. It is slow, unreliable, requires sockets,
> > > > needs more parsing logic e.t.c
> 
> To chime in on the above re: netlink vs ioctl,
> [this is going to be a long message - over caffeinated and stuck on a trip....]
> 
> On "slow" - Mostly netlink can be deemed to "slow" for the following
> reasons 1) locks - which over the last year have been highly reduced
> 2) crossing user/kernel - which i believe is fixable with some mmap
> scheme (although past attempts at doing this have been unsuccessful)
> 3)async vs ioctl sync (more below)
> 
> On "unreliable": This is typically a result of some request response
> (or a subscribed to event) whose execution has failed to allocate
> memory in the kernel or overrun some buffers towards user space;
> however, any such failures are signalled to user space and can be
> recovered from.
> 
> ioctl is synchronous which gives it the "reliability" and "speed".
> iirc, if memory failure was to happen on ioctl it will block until it
> is successful? vs netlink which is async and will get signalled to
> user space if data is lost or cant be fully delivered. Example, if a
> user issued a dump of a very large amount of data from the kernel and
> that data wasnt fully delivered perhaps because of memory pressure,
> user space will be notified via socket errors and can use that info to
> recover.
> 
> Extensibility: ioctl take binary structs which make it much harder to
> extend but adds to that "speed". Once you pick your struct, you are
> stuck with it - as opposed to netlink which uses very extensible
> formally defined TLVs that makes it highly extensible. Yes,
> extensibility requires more parsing as you stated above. Note: if you
> have one-offs you could just hardcode a ioctl-like data structure into
> a TLV and use blocking netlink sockets and that should get you pretty
> close to ioctl "speed"
> 
> To build more on reliability: if you really cared, there are
> mechanisms which can be used to build a fully reliable mechanism of
> communication with the kernel since netlink is infact a wire protocol
> (which alas has been broken for a while because you cant really use it
> as a wire protocol across machines); see for example:
> https://datatracker.ietf.org/doc/html/rfc3549#section-2.3.2.1
> And if you dont really care about reliability you can just shoot
> messages into the kernel and turn off the ACK flag (and then issue
> requests when you feel you need to check on configuration).
> 
> Debuggability: extended ACKs(heavily used by networking) provide an
> excellent operational information user space in fine grained details
> on errors (famous EINVAL can tell you exactly what the EINVAL means
> for example).
> 
> netlink has a multicast publish-subscribe mechanism. Multicast being
> one-to-many means multi-user(important detail for both scaling and
> independent debugging) interface. Meaning you can have multiple
> processes subscribing to events that the kernel publishes. You dont
> have to resort to polling the kernel for details of dynamic changes
> (example "a new entry has been added to table foo" etc)
> As a matter of fact, original design  used to allow user space to
> advertise to both kernel and other user space apps (and unicast worked
> to/from kernel/user and user/user). I haent looked at that recently,
> so it could be broken.
> Note: while these events are also subject to message loss - netlink
> robustness described earlier is usable here as well (via socket
> errors).
> Example, if the kernel attempted to send an event which had the
> misfortune of not making it - user will be notified and can recover by
> requesting a related table dump, etc to see what changed..
> 
> - And as Nik mentioned: The new (yaml)model-to-generatedcode approach
> that is now common in generic netlink highly reduces developer effort.
> Although in my opinion we really need this stuff integrated into tools
> like iproute2..
> 
> I am pretty sure i left out some important details (maybe i can write
> a small doc when i am in better shape).

Thanks for such a detailed answer. I'm not against netlink, I'm against
netlink to configure complex HW objects.

Thanks

> 
> cheers,
> jamal
>