RFC: [Restatement] KBUS messaging subsystem

Tony Ibbs <tibs@xxxxxxxxxxxxxx> · Thu, 28 Jul 2011 22:48:15 +0100

The company I work for does a fair amount of stuff on embedded
systems, and we often need a simple way to send interprocess
messages. DBUS is too large and complex for this, and the ad-hoc
solutions we've seen people develop normally have significant
problems. So we wrote our own. Since it's a kernel module, it seemed
appropriate to submit it for possible inclusion in Linux.

The original submission to the LKML is at
http://marc.info/?l=linux-kernel&m=130047040716848&w=2

Jonathan Corbet (22 Mar 2011, at 19:36) pointed out, though, that

> The interface is ... creative.

and on 15 Apr 2011, at 23:46, in response to a patch I wrote for a
particular issue, he wrote:

> I honestly think that the item at the top of your list, if you want
> to merge this code, must be to get the user-space API more widely
> reviewed and accepted.  It could, I think, be a big sticking point,
> and it's something you want to try to address sooner rather than later.
>
> That means getting more people to look at the patch, which could be hard.
> The problem is that, if you wait, they'll only squeal when the code is
> close to going in, and you could find yourself set back a long way.  A
> good first step might be to CC Andrew Morton on your next posting.

Other matters have caused this response to be a bit delayed, but he's
clearly right that this should be addressed. And whilst we ourselves
obviously don't mind the API, it's the functionality that we *really*
care about. So there are two questions

- is KBUS something that should be in the kernel, and if it is,
- should it have a different API?

What we want of the system
==========================
We want KBUS to be very quick to learn, and very simple to use, not
least because the target users are often already experts in several
domains, and they don't have time or effort available to spend on
learning a complex system just to send messages.

We provide user-space libraries in various languages (including C and
C++), but we also want KBUS to be simple to use "bare" - target
systems may be small, and may not have much user-space. This also
means that KBUS itself needs to be written in C.

Technically, the most important requirements are:

* We require deterministic message ordering. If A, B and C all send
  messages, anyone receiving those messages must see them in the same
  order, which must be the order they were sent in.

* Messages are identified by name. Names are formed of dot-separated
  words.
* Listeners choose which messages they wish to receive, on the basis
  of the message name. Some simple filtering is available by
  wildcarding on the last word or words of the name.
* All messages are implicitly broadcast (to anyone who wishes to
  receive them).
* Request/reply is handled by a single listener binding as a replier
  to messages with a particular name. When a request message with that
  name is sent, the replier is reponsible for sending a reply message.
  (The request and reply are both still broadcast to any "normal"
  listeners.)
* When a request message is sent, but a reply from the original
  replier is not possible - e.g., if there is noone bound as replier,
  or the original replier has unbound, or even crashed - then the
  system will send a synthetic reply to indicate this. In informal
  terms, the system guarantees a request will get a reply.

The system deliberately avoids a client/server model.

* Any client can be a sender and a listener on the same connection.
* KBUS makes no restriction on who sends messages with what names,
  and it does not restrict who may send request messages.

Whilst requests are guaranteed to be sent to the appropriate recipient
(if they're still listening), "general" listening is by default lossy.
That is, if a listener's incoming message queue is full, then new
messages will by default be dropped (but not if they're a request to
that listener, of course). However, the sender of a message can
choose that all recipients must be able to receive a particular
message (sending with either all-or-fail or all-or-wait).

* KBUS does not say anything about the data (if any) in a message.
  Different users have different requirements, so this gets
  complicated very quickly, and there are many excellent independent
  solutions for this.
* We do want support for multiple buses, which must be walled off from
  each other. Naming them by anything complicated isn't needed.

So why did we write it as a kernel module?
==========================================
As implementors, a kernel module makes a lot of sense. Not least
because:

* It gives us a lot of things for free, including list handling,
  reference counting, thread safety and (on larger systems)
  multi-processor support, which we would otherwise have to write and
  debug ourselves. This also keeps our codebase smaller.
* It helps give us reliability, partly because of the code we're
  relying on, partly because of the strictures of working in the
  kernel, partly by shielding us from userspace.
* It reduces message copying (we have userspace to kernel back to
  userspace, as opposed to a userspace daemon communicating with
  clients via sockets)
* It makes it simple for us to tell when a message recipient has "gone
  away", as the kernel will call our "release" callback for us.
* It allows us to provide the functionality on systems without
  requiring anything much beyond /dev and maybe /proc in userspace.

Since our potential users are generally all building Linux anyway,
adding an extra kernel module is not an issue.

The API
=======
KBUS as written uses the file API. KBUS bus <n> is represented as
device file /dev/kbus<n>. Bus 0 is always present.

So a client connects to KBUS by calling "open" on the appropriate
device file, and disconnects with "close". Messages are sent by
writing the appropriate data to the device using "write", and using a
SEND ioctl to indicate that the message is to be sent [1]. Messages
are read by calling an ioctl to find the length of the next message
(0 if there is none), and then reading it using "read". Polling may be
used to wait for a message read/write in the normal manner.

[1] It has already been suggested that the SEND is not needed given
    that the end of a message is deterministic, but it mirrors the
    "is there a next message" ioctl, and felt tidier to us than
    producing a "can't send" error when the write(s) for the message
    data are finished.

I believe that this use of the file API is what Jonathan Corbet calls
"creative" (we can't disagree with him).

We chose this interface for several reasons:

* All C programmers are likely to have used open/close/read/write at
  some time, and even if they've not used ioctls, our target audience
  will have called functions with arbitrary datastructures.
* We didn't want to (try to) add a new specific API to the kernel
  (I assume that is a luxury whose day has long gone).
* The socket model didn't seem like a good fit to us, not least
  because we did not want a client/server model.
* There is a lot of information on how to write a file-based kernel
  module, including many examples.
* It is obvious that the "release" callback for a "file" will be
  called when the file is closed or the opening process crashes.

I must admit that, this being my first kernel module, the "lots of
documentation and examples" influenced me.

Several people on-list have wondered why we didn't use some socket
interface. This seems partly to be based on the assumption that
"everyone knows how to do sockets" (I may be being unfair, here),
which is not in fact true - many C programmers have not worked with
sockets.

I'm not aware of any good explanations of how to do a socket based
interface to a kernel module, and particularly not of documentation of
how one *should* do it. Mea culpa if I've just looked in the wrong
places.

The interactions we'd *like* our API to have go more-or-less thus:

* Send: either manufacture an entire message, pass it to KBUS,
  and then send it, or pass parts of a message to KBUS, and when all
  of the parts have been given, send it. This latter allows (for
  instance) a common header to be written repeatedly, followed by
  varying data.

* Read: Ask for (the size of) the next message. A size of 0 can
  conveniently be taken as meaning there is no next message. Then, as
  a separate call, read that many bytes.

* Discard a message that is waiting to be read. Current KBUS does this
  by asking for (the size of) the next message again, without an
  intermediate read.

* Poll for the next message to be read. Obviously useful.

* Poll for being allowed to send a new message. This is useful if a
  send was rejected, perhaps because it was a request message, and
  the replier does not have room in their queue for it.

As I said earlier, I'm not sure how one would write a kernel module
with a socket-style API that does what KBUS does, although clearly one
could imagine such a thing. I must admit that our feeling is that it
would be "stretching" the normal assumptions of how sockets work in
much the same way that the current API is doing for the file-like
interface.

What already exists
===================
There is a general KBUS homepage at http://kbus-messaging.org/.

The version of KBUS on Google Code, at http://code.google.com/p/kbus/,
is used by several clients of ours, and includes a "higher level" C
library, plus bindings for Python, C++ and Java.

There is a working repository with these patches applied to
Linux 2.6.37, available via:

git pull git://github.com/crazyscot/linux-2.6-kbus.git kbus-2.6.37

The original patches were applied in branch apply-patchset-20110318
-- 
Tibs
--
To unsubscribe from this list: send the line "unsubscribe linux-embedded" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html