On Fri, 2022-08-19 at 09:20 -0700, Jakub Kicinski wrote: > > > > > > +Headers in netlink must be aligned to 4 bytes from the start of the message, > > > > s/Headers/Attribute headers/ perhaps? > > Theoretically I think we also align what I called "fixed metadata > headers", practically all of those are multiple of 4 :S But they're not really aligned, are they? Hmm. Well I guess practically it doesn't matter. I just read this and wasn't really sure what the mention of "[h]eaders" was referring to in this context. > > > +hence the extra ``\0\0`` at the end of the message. > > > > And I think technically for the _last_ attribute it wouldn't be needed? > > True, it's not strictly necessary AFAIU. Should I mention it > or would that be over-complicating things? > > I believe that kernel will accept both forms (without tripping > the trailing data warning), and both the kernel and mnl will pad > out the last attr. Yeah, probably not worth mentioning it. I think what threw me off was the explicit mention of "at the end of the message" - perhaps just say "after the family name attribute"? > > > +``NLMSGERR_ATTR_MSG`` carries a message in English describing > > > +the encountered problem. These messages are far more detailed > > > +than what can be expressed thru standard UNIX error codes. > > > > "through"? > > How much do you care? Maybe Jon has guidelines? Hah, not much really. > > > +Querying family information is useful in rare cases when user space needs > > > > debatable if that's "rare", but yeah, today it's not done much :) > > Some of the text is written with the implicit goal of comforting > the newcomer ;) :-) In this document it just feels like saying it _should_ be rare, but I'm not sure it should? We ignore it a lot in practice, but maybe we should be doing it more? > > I think this could be written a bit better - we call this thing a "port > > ID" internally now, and yes, it might default to a process ID (more > > specifically task group ID) ... but it feels like this could explain > > bind vs. autobind etc. a bit more? And IMHO it should focus less on the > > process ID/PID than saying "port ID" with a (historical) default of > > using the PID/TGID. > > I'll rewrite. The only use I'm aware of is OvS upcalls, are there more? In nl80211 we have quite a few "unicast an event message to a specific portid" uses, e.g. if userspace subscribes to certain action frames, the frame notification for it would be unicast to the subscribed socket, or the TX status response after a frame was transmitted, etc. etc. > Practically speaking for a person trying to make a ethtool, FOU, > devlink etc. call to the kernel this is 100% irrelevant. Fair point, depends on what you're using and what programming model that has. > > > +Strict checking > > > +--------------- > > > + > > > +The ``NETLINK_GET_STRICT_CHK`` socket option enables strict input checking > > > +in ``NETLINK_ROUTE``. It was needed because historically kernel did not > > > +validate the fields of structures it didn't process. This made it impossible > > > +to start using those fields later without risking regressions in applications > > > +which initialized them incorrectly or not at all. > > > + > > > +``NETLINK_GET_STRICT_CHK`` declares that the application is initializing > > > +all fields correctly. It also opts into validating that message does not > > > +contain trailing data and requests that kernel rejects attributes with > > > +type higher than largest attribute type known to the kernel. > > > + > > > +``NETLINK_GET_STRICT_CHK`` is not used outside of ``NETLINK_ROUTE``. > > > > However, there are also more generally strict checks in policy > > validation ... maybe a discussion of all that would be worthwhile? > > Yeah :( It's too much to describe to a newcomer, I figured. I refer > those who care to the enum field in the next section. We'd need a full > table of families and attrs which start strict(er) validation.. bah. Too > much technical debt. Yes ... Also sometimes attributes in a dump request are checked, sometimes not, sometimes just ignored, etc. Certainly not worth handling here though. > > - maybe not the appropriate place here, but maybe some best practices > > for handling attributes, such as the multi-attribute array thing we > > discussed in the other thread? > > Right, this doc is meant for the user rather than kernel dev. > Makes sense. The user anyway will have to look at how their specific family does things, and do it accordingly. > I'm planning to write a separate doc for the kernel dev. Nice, looking forward to that! :) > I started writing this one as guide for a person who would like to write > a YAML NL library for their fav user space language but has no prior > knowledge of netlink and does not know where to start. Makes sense. > > - maybe more userspace recommendations such as using different sockets > > for multicast listeners and requests, because otherwise it gets > > tricky to wait for the ACK of a request since you have to handle > > notifications that happen meanwhile? > > Hm, good point. I should add a section on multicast and make it part > of that. True that, multicast more generally is something to know about. > > - maybe some mention of the fact that sometimes we now bind kernel > > object or state lifetime to a socket, e.g. in wireless you can > > connect and if your userspace crashes/closes the socket, the > > connection is automatically torn down (because you can't handle the > > things needed anymore) > > 😍 Can you point me to the code? (probably too advanced for this doc > but the idea seems super useful!) Look at the uses of NL80211_ATTR_SOCKET_OWNER, e.g. you can * create a virtual interface and have it disappear if you close the socket (_nl80211_new_interface) * AP stopped if you close the socket (nl80211_start_ap) * some regulatory stuff reset (nl80211_req_set_reg) * background ("scheduled") scan stopped (nl80211_start_sched_scan) * connection torn down (nl80211_associate) * etc. The actual teardown handling is in nl80211_netlink_notify(). I guess I can agree though it doesn't really belong here - again something specific to the operations you're doing. > > - maybe something about message sizes? we've had lots of trouble with > > that in nl80211, but tbh I'm not really sure what we should say about > > it other than making sure you use large enough buffers ... > > Yes :S What's the error reported when the buffer is too small? > recv() = -1, errno = EMSGSIZE? Does the message get discarded > or can it be re-read? I don't have practical experience with > that one. Ugh, I repressed all those memories ... I don't remember now, I guess I'd have to try it. Also it doesn't just apply to normal stuff but also multicast, and that can be even trickier. johannes