Re: [RFCv2 2/4] nl80211: Support >4096 byte NEW_WIPHY event nlmsg

Denis Kenzior <denkenz@xxxxxxxxx> · Fri, 30 Aug 2019 14:56:17 -0500

Hi Johannes,

On 8/30/19 4:36 AM, Johannes Berg wrote:
On Fri, 2019-08-16 at 14:27 -0500, Denis Kenzior wrote:
For historical reasons, NEW_WIPHY messages generated by dumps or
GET_WIPHY commands were limited to 4096 bytes due to userspace tools
using limited buffers.

I think now that I've figured out why, it'd be good to note that it
wasn't due to userspace tools, but rather due to the default netlink
dump skb allocation at the time, prior to commit  9063e21fb026
("netlink: autosize skb lengthes").

Sure, will take care of it.

Once the sizes NEW_WIPHY messages exceeded these
sizes, split dumps were introduced.  All any non-legacy data was added
only to messages using split-dumps (including filtered dumps).

However, split-dumping has quite a significant overhead.  On cards
tested, split dumps generated message sizes 1.7-1.8x compared to
non-split dumps, while still comfortably fitting into an 8k buffer.  The
kernel now expects userspace to provide 16k buffers by default, and 32k
buffers are possible.

Introduce a concept of a large message, so that if the kernel detects
that userspace has provided a buffer of sufficient size, a non-split
message could be generated.

So, there's still a wrinkle with this. Larger SKB allocation can fail,
and instead of returning an error to userspace, the kernel will allocate
a smaller SKB instead.

With genetlink, we currently don't even have a way of controlling the
minimum allocation that's always required.

Since we already have basically all of the mechanics, I'd say perhaps a
better concept would be to "split when necessary", aborting if split
isn't supported.

IOW, do something like

... nl80211_send_wiphy(...)
{
[...]

switch (state->split_start) {
[...]
case <N>:
	[...] // put stuff
	state->split_start++;
	state->skb_end = nlmsg_get_pos(skb);
	/* fall through */
case <N+1>:
[...]
}

finish:
	genlmsg_end(msg, hdr);
	return 0;
nla_put_failure:
	if (state->split_start < 9) {
		genlmsg_cancel(msg, hdr);
		return -EMSGSIZE;
	}
	nlmsg_trim(msg, state->skb_end);
	goto finish;
}

That way, we fill each SKB as much as possible, up to 32k if userspace
provided big enough buffers *and* we could allocate the SKB.

Your userspace would still set the split flag, and thus be compatible
with all kinds of options:
  * really old kernel not supporting split
  * older kernel sending many messages
  * kernel after this change packing more into one message
  * even if allocating big SKBs failed

What I was thinking was to attempt to build a large message first and if 
that fails to fail over to the old logic.  But I think what you propose 
is even better.  I'll incorporate this feedback into the next version.

Regards,
-Denis