Re: Edited draft of bpf(2) man page for review/enhancement

Daniel Borkmann <daniel@xxxxxxxxxxxxx> · Wed, 22 Jul 2015 17:12:32 +0200

On 07/22/2015 04:49 PM, Michael Kerrisk (man-pages) wrote:
Hi Daniel,

Sorry for the long delay in following up....

No worries, eBPF is quite some material. ;)

On 05/27/2015 11:26 AM, Daniel Borkmann wrote:
On 05/27/2015 10:43 AM, Michael Kerrisk (man-pages) wrote:
Hello Alexei,

I took the draft 3 of the bpf(2) man page that you sent back in March
and did some substantial editing to clarify the language and add a
few technical details. Could you please check the revised  version
below, to ensure I did not inject any errors.

I also added a number of FIXMEs for pieces of the page that need
further work. Could you take a look at these and let me know your
thoughts, please.

That's great, thanks! Minor comments:

...
.TH BPF 2 2015-03-10 "Linux" "Linux Programmer's Manual"
.SH NAME
bpf - perform a command on an extended BPF map or program
.SH SYNOPSIS
.nf
.B #include <linux/bpf.h>
.sp
.BI "int bpf(int cmd, union bpf_attr *attr, unsigned int size);

.SH DESCRIPTION
The
.BR bpf ()
system call performs a range of operations related to extended
Berkeley Packet Filters.
Extended BPF (or eBPF) is similar to
the original BPF (or classic BPF) used to filter network packets.
For both BPF and eBPF programs,
the kernel statically analyzes the programs before loading them,
in order to ensure that they cannot harm the running system.
.P
eBPF extends classic BPF in multiple ways including the ability to call
in-kernel helper functions (via the
.B BPF_CALL
opcode extension provided by eBPF)
and access shared data structures such as BPF maps.

I would perhaps emphasize that maps can be shared among in-kernel
eBPF programs, but also between kernel and user space.

This is covered later in the page, under the "BPF maps" subheading.
Maybe you missed that? (Or did you think it doesn't suffice?)

Okay, I presume you mean:

  Maps are a generic data structure for storage of different types
  and sharing data between the kernel and user-space programs.

Maybe, to emphasize both options a bit (not sure if it's better in
my words, though):

  Maps are a generic data structure for storage of different types
  and allow for sharing data among eBPF kernel programs, but also
  between kernel and user-space applications.

The programs can be written in a restricted C that is compiled into
.\" FIXME In the next line, what is "a restricted C"? Where does
.\"       one get further information about it?

So far only from the kernel samples directory and for tc classifier
and action, from the tc man page and/or examples/bpf/ in the tc git
tree.

So, given that we are several weeks down the track, and things may have
changed, I'll re-ask the questions ;-) :

* Is this restricted C documented anywhere?

Not (yet) that I'm aware of. We were thinking that short-mid term
to polish the stuff that resides in the kernel documentation, that
is, Documentation/networking/filter.txt, to get it in a better
shape, which I presume, would also include a documentation on the
restricted C. So far, examples are provided in the tc-bpf man page
(see link below).

The set of available helper functions callable from eBPF resides
under (enum bpf_func_id):

  https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/bpf.h

* Is the procedure for compiling this restricted C documented anywhere?
   (Yes, it's LLVM, but are the suitable pipelines/options documented
   somewhere?)

eBPF bytecode and executed on the in-kernel virtual machine or
just-in-time compiled into native code.
.SS Extended BPF Design/Architecture
.P
.\" FIXME In the following line, what does "different data types" mean?
.\"       Are the values in a map not just blobs?

Sort of, currently, these blobs can have different sizes of keys
and values (you can even have structs as keys). For the map itself
they are treated as blob internally. However, recently, bpf tail call
got added where you can lookup another program from an array map and
call into it. Here, that particular type of map can only have entries
of type of eBPF program fd. I think, if needed, adding a paragraph to
the tail call could be done as follow-up after we have an initial man
page in the tree included.

Okay -- I've added a FIXME placeholder for this, so we can revisit.

Okay.

BPF maps are a generic data structure for storage of different data types.
A user process can create multiple maps (with key/value-pairs being
opaque bytes of data) and access them via file descriptors.
BPF programs can access maps from inside the kernel in parallel.
It's up to the user process and BPF program to decide what they store
inside maps.
.P
BPF programs are similar to kernel modules.
They are loaded by the user
process and automatically unloaded when the process exits.

Generally that's true. Btw, in 4.1 kernel, tc(8) also got support for
eBPF classifier and actions, and here it's slightly different: in tc,
we load the programs, maps etc, and push down the eBPF program fd in
order to let the kernel hold reference on the program itself.

Thus, there, the program fd that the application owns is gone when the
application terminates, but the eBPF program itself still lives on
inside the kernel. But perhaps it's already too much detail to mention
here ...

Well, it should be documented somewhere....

Yep, fwiw some time ago I've hacked together a man page for tc:

https://git.kernel.org/cgit/linux/kernel/git/shemminger/iproute2.git/commit/?id=cbdd1e6921d21815e35d2a96526cfbad5ac98e09

Each BPF program is a set of instructions that is safe to run until
its completion.
The in-kernel BPF verifier statically determines that the program
terminates and is safe to execute.
.\" FIXME In the following sentence, what does "takes hold" mean?

Takes a reference. Meaning, that maps cannot disappear under us while
the eBPF program that is using them in the kernel is still alive.

So, I changed this to:

[[
During verification, the kernel increments reference counts for each of
the maps that the eBPF program uses,
so that the selected maps cannot be removed until the program is unloaded.
]]

Okay?

Okay.

[...]
I'll send out a new draft soon, but in the meantime hopefully you
or Alexei might have a chance to answer some open questions (see my
other mail to Alexei, which will be sent soon), so I can further edit
the page before sending it out.

Later on, we should also add a paragraph on eBPF tail calls, but one
step at a time.

Thanks again,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html