Re: [PATCH net-next 2/2] docs: netlink: basic introduction to Netlink

Johannes Berg <johannes@xxxxxxxxxxxxxxxx> · Fri, 19 Aug 2022 18:57:01 +0200

On Fri, 2022-08-19 at 09:20 -0700, Jakub Kicinski wrote:
> > > 
> > > +Headers in netlink must be aligned to 4 bytes from the start of the message,  
> > 
> > s/Headers/Attribute headers/ perhaps?
> 
> Theoretically I think we also align what I called "fixed metadata
> headers", practically all of those are multiple of 4 :S

But they're not really aligned, are they? Hmm. Well I guess practically
it doesn't matter. I just read this and wasn't really sure what the
mention of "[h]eaders" was referring to in this context.

> > > +hence the extra ``\0\0`` at the end of the message.
> > 
> > And I think technically for the _last_ attribute it wouldn't be needed?
> 
> True, it's not strictly necessary AFAIU. Should I mention it 
> or would that be over-complicating things?
> 
> I believe that kernel will accept both forms (without tripping 
> the trailing data warning), and both the kernel and mnl will pad 
> out the last attr.

Yeah, probably not worth mentioning it.

I think what threw me off was the explicit mention of "at the end of the
message" - perhaps just say "after the family name attribute"?

> > > +``NLMSGERR_ATTR_MSG`` carries a message in English describing
> > > +the encountered problem. These messages are far more detailed
> > > +than what can be expressed thru standard UNIX error codes.  
> > 
> > "through"?
> 
> How much do you care? Maybe Jon has guidelines?

Hah, not much really.

> > > +Querying family information is useful in rare cases when user space needs  
> > 
> > debatable if that's "rare", but yeah, today it's not done much :)
> 
> Some of the text is written with the implicit goal of comforting 
> the newcomer ;)

:-)

In this document it just feels like saying it _should_ be rare, but I'm
not sure it should? We ignore it a lot in practice, but maybe we should
be doing it more?

> > I think this could be written a bit better - we call this thing a "port
> > ID" internally now, and yes, it might default to a process ID (more
> > specifically task group ID) ... but it feels like this could explain
> > bind vs. autobind etc. a bit more? And IMHO it should focus less on the
> > process ID/PID than saying "port ID" with a (historical) default of
> > using the PID/TGID.
> 
> I'll rewrite. The only use I'm aware of is OvS upcalls, are there more?

In nl80211 we have quite a few "unicast an event message to a specific
portid" uses, e.g. if userspace subscribes to certain action frames, the
frame notification for it would be unicast to the subscribed socket, or
the TX status response after a frame was transmitted, etc. etc.

> Practically speaking for a person trying to make a ethtool, FOU,
> devlink etc. call to the kernel this is 100% irrelevant.

Fair point, depends on what you're using and what programming model that
has.

> > > +Strict checking
> > > +---------------
> > > +
> > > +The ``NETLINK_GET_STRICT_CHK`` socket option enables strict input checking
> > > +in ``NETLINK_ROUTE``. It was needed because historically kernel did not
> > > +validate the fields of structures it didn't process. This made it impossible
> > > +to start using those fields later without risking regressions in applications
> > > +which initialized them incorrectly or not at all.
> > > +
> > > +``NETLINK_GET_STRICT_CHK`` declares that the application is initializing
> > > +all fields correctly. It also opts into validating that message does not
> > > +contain trailing data and requests that kernel rejects attributes with
> > > +type higher than largest attribute type known to the kernel.
> > > +
> > > +``NETLINK_GET_STRICT_CHK`` is not used outside of ``NETLINK_ROUTE``.  
> > 
> > However, there are also more generally strict checks in policy
> > validation ... maybe a discussion of all that would be worthwhile?
> 
> Yeah :( It's too much to describe to a newcomer, I figured. I refer
> those who care to the enum field in the next section. We'd need a full
> table of families and attrs which start strict(er) validation.. bah. Too
> much technical debt.

Yes ... Also sometimes attributes in a dump request are checked,
sometimes not, sometimes just ignored, etc. Certainly not worth handling
here though.

> >  - maybe not the appropriate place here, but maybe some best practices
> >    for handling attributes, such as the multi-attribute array thing we
> >    discussed in the other thread?
> 
> Right, this doc is meant for the user rather than kernel dev. 
> 

Makes sense. The user anyway will have to look at how their specific
family does things, and do it accordingly.

> I'm planning to write a separate doc for the kernel dev.

Nice, looking forward to that! :)

> I started writing this one as guide for a person who would like to write
> a YAML NL library for their fav user space language but has no prior
> knowledge of netlink and does not know where to start.

Makes sense.

> >  - maybe more userspace recommendations such as using different sockets
> >    for multicast listeners and requests, because otherwise it gets
> >    tricky to wait for the ACK of a request since you have to handle
> >    notifications that happen meanwhile?
> 
> Hm, good point. I should add a section on multicast and make it part 
> of that.

True that, multicast more generally is something to know about.

> >  - maybe some mention of the fact that sometimes we now bind kernel
> >    object or state lifetime to a socket, e.g. in wireless you can
> >    connect and if your userspace crashes/closes the socket, the
> >    connection is automatically torn down (because you can't handle the
> >    things needed anymore)
> 
> 😍 Can you point me to the code? (probably too advanced for this doc
> but the idea seems super useful!)

Look at the uses of NL80211_ATTR_SOCKET_OWNER, e.g. you can
 * create a virtual interface and have it disappear if you close the
   socket (_nl80211_new_interface)
 * AP stopped if you close the socket (nl80211_start_ap)
 * some regulatory stuff reset (nl80211_req_set_reg)
 * background ("scheduled") scan stopped (nl80211_start_sched_scan)
 * connection torn down (nl80211_associate)
 * etc.

The actual teardown handling is in nl80211_netlink_notify().

I guess I can agree though it doesn't really belong here - again
something specific to the operations you're doing.

> >  - maybe something about message sizes? we've had lots of trouble with
> >    that in nl80211, but tbh I'm not really sure what we should say about
> >    it other than making sure you use large enough buffers ...
> 
> Yes :S What's the error reported when the buffer is too small?
> recv() = -1, errno = EMSGSIZE? Does the message get discarded 
> or can it be re-read? I don't have practical experience with
> that one.

Ugh, I repressed all those memories ... I don't remember now, I guess
I'd have to try it. Also it doesn't just apply to normal stuff but also
multicast, and that can be even trickier.

johannes