On Wed, Mar 12, 2025 at 11:11 AM Leon Romanovsky <leon@xxxxxxxxxx> wrote: > > On Wed, Mar 12, 2025 at 04:20:08PM +0200, Nikolay Aleksandrov wrote: > > On 3/12/25 1:29 PM, Leon Romanovsky wrote: > > > On Wed, Mar 12, 2025 at 11:40:05AM +0200, Nikolay Aleksandrov wrote: > > >> On 3/8/25 8:46 PM, Leon Romanovsky wrote: > > >>> On Fri, Mar 07, 2025 at 01:01:50AM +0200, Nikolay Aleksandrov wrote: > > [snip] > > >> Also we have the ephemeral PDC connections>> that come and go as > > needed. There more such objects coming with more > > >> state, configuration and lifecycle management. That is why we added a > > >> separate netlink family to cleanly manage them without trying to fit > > >> a square peg in a round hole so to speak. > > > > > > Yeah, I saw that you are planning to use netlink to manage objects, > > > which is very questionable. It is slow, unreliable, requires sockets, > > > needs more parsing logic e.t.c To chime in on the above re: netlink vs ioctl, [this is going to be a long message - over caffeinated and stuck on a trip....] On "slow" - Mostly netlink can be deemed to "slow" for the following reasons 1) locks - which over the last year have been highly reduced 2) crossing user/kernel - which i believe is fixable with some mmap scheme (although past attempts at doing this have been unsuccessful) 3)async vs ioctl sync (more below) On "unreliable": This is typically a result of some request response (or a subscribed to event) whose execution has failed to allocate memory in the kernel or overrun some buffers towards user space; however, any such failures are signalled to user space and can be recovered from. ioctl is synchronous which gives it the "reliability" and "speed". iirc, if memory failure was to happen on ioctl it will block until it is successful? vs netlink which is async and will get signalled to user space if data is lost or cant be fully delivered. Example, if a user issued a dump of a very large amount of data from the kernel and that data wasnt fully delivered perhaps because of memory pressure, user space will be notified via socket errors and can use that info to recover. Extensibility: ioctl take binary structs which make it much harder to extend but adds to that "speed". Once you pick your struct, you are stuck with it - as opposed to netlink which uses very extensible formally defined TLVs that makes it highly extensible. Yes, extensibility requires more parsing as you stated above. Note: if you have one-offs you could just hardcode a ioctl-like data structure into a TLV and use blocking netlink sockets and that should get you pretty close to ioctl "speed" To build more on reliability: if you really cared, there are mechanisms which can be used to build a fully reliable mechanism of communication with the kernel since netlink is infact a wire protocol (which alas has been broken for a while because you cant really use it as a wire protocol across machines); see for example: https://datatracker.ietf.org/doc/html/rfc3549#section-2.3.2.1 And if you dont really care about reliability you can just shoot messages into the kernel and turn off the ACK flag (and then issue requests when you feel you need to check on configuration). Debuggability: extended ACKs(heavily used by networking) provide an excellent operational information user space in fine grained details on errors (famous EINVAL can tell you exactly what the EINVAL means for example). netlink has a multicast publish-subscribe mechanism. Multicast being one-to-many means multi-user(important detail for both scaling and independent debugging) interface. Meaning you can have multiple processes subscribing to events that the kernel publishes. You dont have to resort to polling the kernel for details of dynamic changes (example "a new entry has been added to table foo" etc) As a matter of fact, original design used to allow user space to advertise to both kernel and other user space apps (and unicast worked to/from kernel/user and user/user). I haent looked at that recently, so it could be broken. Note: while these events are also subject to message loss - netlink robustness described earlier is usable here as well (via socket errors). Example, if the kernel attempted to send an event which had the misfortune of not making it - user will be notified and can recover by requesting a related table dump, etc to see what changed.. - And as Nik mentioned: The new (yaml)model-to-generatedcode approach that is now common in generic netlink highly reduces developer effort. Although in my opinion we really need this stuff integrated into tools like iproute2.. I am pretty sure i left out some important details (maybe i can write a small doc when i am in better shape). cheers, jamal