Hello, I want to return to a discussion about the netlink interface and how to use it out of the network subsystem. I'm developing a new interface to get information about processes (task_diag). task_diag is like socket_diag but for processes. [0] In the first two versions [1] [2], I used the netlink interface to communicate with kernel. There was a discussion [4], that the netlink interface is not suitable for this task and it has a few known issues about security, so probably it should not be used for task_diag. Then, in a third version [3], I used a proc transaction file instead of the netlink interface. But it was not accepted too, because we already have the netlink interface[5] and it's a bad idea to add one more similar less-generic interface. Then Andy Lutomirski suggested to rework netlink [6], but nobody answered on his suggestion. Can we continue this discussion and find a final solution? Maybe we need to schedule a face-to-face meeting on one of conferences? It may be Linux Plumbers, for example. Here is Andy's idea how the netlink interface can be reworked: On Wed, May 04, 2016 at 08:39:51PM -0700, Andy Lutomirski wrote: > Netlink had, and possibly still has, tons of serious security bugs > involving code checking send() callers' creds. I found and fixed a > few a couple years ago. To reiterate once again, send() CANNOT use > caller creds safely. (I feel like I say this once every few weeks. > It's getting old.) > > I realize that it's convenient to use a socket as a context to keep > state between syscalls, but it has some annoying side effects: > > - It makes people want to rely on send()'s caller's creds. > > - It's miserable in combination with seccomp. > > - It doesn't play nicely with namespaces. > > - It makes me wonder why things like task_diag, which have nothing to > do with networking, seem to get tangled up with networking. > > > Would it be worth considering adding a parallel interface, using it > for new things, and slowly migrating old use cases over? > > int issue_kernel_command(int ns, int command, const struct iovec *iov, > int iovcnt, int flags); > > ns is an actual namespace fd or: > > KERNEL_COMMAND_CURRENT_NETNS > KERNEL_COMMAND_CURRENT_PIDNS > etc, or a special one: > KERNEL_COMMAND_GLOBAL. KERNEL_COMMAND_GLOBAL can't be used in a > non-root namespace. > > KERNEL_COMMAND_GLOBAL works even for namespaced things, if the > relevant current ns is the init namespace. (This feature is optional, > but it would allow gradually namespacing global things.) > command is an enumerated command. Each command implies a namespace > type, and, if you feed this thing the wrong namespace type, you get > EINVAL. The high bit of command indicates whether it's read-only > command. > > iov gives a command in the format expected, which, for the most part, > would be a netlink message. > > The return value is an fd that you can call read/readv on to read the > response. It's not a socket (or at least you can't do normal socket > operations on it if it is a socket behind the scenes). The > implementation of read() promises *not* to look at caller creds. The > returned fd is unconditionally cloexec -- it's 2016 already. Sheesh. > > When you've read all the data, all you can do is close the fd. You > can't issue another command on the same fd. You also can't call > write() or send() on the fd unless someone has a good reason why you > should be able to and why it's safe. You can't issue another command > on the same fd. > > > I imagine that the implementation could re-use a bunch of netlink code > under the hood. [6] https://www.mail-archive.com/netdev@xxxxxxxxxxxxxxx/msg109212.html [5] https://lkml.org/lkml/2016/5/4/785 [4] https://lkml.org/lkml/2015/7/6/708 [3] https://lwn.net/Articles/683371/ [2] https://lkml.org/lkml/2015/7/6/142 [1] https://lwn.net/Articles/633622/ [0] https://criu.org/Task-diag Thanks, Andrei -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html