On Sat, Mar 15, 2025 at 04:49:20PM -0400, Jamal Hadi Salim wrote: > On Wed, Mar 12, 2025 at 11:11 AM Leon Romanovsky <leon@xxxxxxxxxx> wrote: > > > > On Wed, Mar 12, 2025 at 04:20:08PM +0200, Nikolay Aleksandrov wrote: > > > On 3/12/25 1:29 PM, Leon Romanovsky wrote: > > > > On Wed, Mar 12, 2025 at 11:40:05AM +0200, Nikolay Aleksandrov wrote: > > > >> On 3/8/25 8:46 PM, Leon Romanovsky wrote: > > > >>> On Fri, Mar 07, 2025 at 01:01:50AM +0200, Nikolay Aleksandrov wrote: > > > [snip] > > > >> Also we have the ephemeral PDC connections>> that come and go as > > > needed. There more such objects coming with more > > > >> state, configuration and lifecycle management. That is why we added a > > > >> separate netlink family to cleanly manage them without trying to fit > > > >> a square peg in a round hole so to speak. > > > > > > > > Yeah, I saw that you are planning to use netlink to manage objects, > > > > which is very questionable. It is slow, unreliable, requires sockets, > > > > needs more parsing logic e.t.c > > To chime in on the above re: netlink vs ioctl, > [this is going to be a long message - over caffeinated and stuck on a trip....] > > On "slow" - Mostly netlink can be deemed to "slow" for the following > reasons 1) locks - which over the last year have been highly reduced > 2) crossing user/kernel - which i believe is fixable with some mmap > scheme (although past attempts at doing this have been unsuccessful) > 3)async vs ioctl sync (more below) > > On "unreliable": This is typically a result of some request response > (or a subscribed to event) whose execution has failed to allocate > memory in the kernel or overrun some buffers towards user space; > however, any such failures are signalled to user space and can be > recovered from. > > ioctl is synchronous which gives it the "reliability" and "speed". > iirc, if memory failure was to happen on ioctl it will block until it > is successful? vs netlink which is async and will get signalled to > user space if data is lost or cant be fully delivered. Example, if a > user issued a dump of a very large amount of data from the kernel and > that data wasnt fully delivered perhaps because of memory pressure, > user space will be notified via socket errors and can use that info to > recover. > > Extensibility: ioctl take binary structs which make it much harder to > extend but adds to that "speed". Once you pick your struct, you are > stuck with it - as opposed to netlink which uses very extensible > formally defined TLVs that makes it highly extensible. Yes, > extensibility requires more parsing as you stated above. Note: if you > have one-offs you could just hardcode a ioctl-like data structure into > a TLV and use blocking netlink sockets and that should get you pretty > close to ioctl "speed" > > To build more on reliability: if you really cared, there are > mechanisms which can be used to build a fully reliable mechanism of > communication with the kernel since netlink is infact a wire protocol > (which alas has been broken for a while because you cant really use it > as a wire protocol across machines); see for example: > https://datatracker.ietf.org/doc/html/rfc3549#section-2.3.2.1 > And if you dont really care about reliability you can just shoot > messages into the kernel and turn off the ACK flag (and then issue > requests when you feel you need to check on configuration). > > Debuggability: extended ACKs(heavily used by networking) provide an > excellent operational information user space in fine grained details > on errors (famous EINVAL can tell you exactly what the EINVAL means > for example). > > netlink has a multicast publish-subscribe mechanism. Multicast being > one-to-many means multi-user(important detail for both scaling and > independent debugging) interface. Meaning you can have multiple > processes subscribing to events that the kernel publishes. You dont > have to resort to polling the kernel for details of dynamic changes > (example "a new entry has been added to table foo" etc) > As a matter of fact, original design used to allow user space to > advertise to both kernel and other user space apps (and unicast worked > to/from kernel/user and user/user). I haent looked at that recently, > so it could be broken. > Note: while these events are also subject to message loss - netlink > robustness described earlier is usable here as well (via socket > errors). > Example, if the kernel attempted to send an event which had the > misfortune of not making it - user will be notified and can recover by > requesting a related table dump, etc to see what changed.. > > - And as Nik mentioned: The new (yaml)model-to-generatedcode approach > that is now common in generic netlink highly reduces developer effort. > Although in my opinion we really need this stuff integrated into tools > like iproute2.. > > I am pretty sure i left out some important details (maybe i can write > a small doc when i am in better shape). Thanks for such a detailed answer. I'm not against netlink, I'm against netlink to configure complex HW objects. Thanks > > cheers, > jamal >