Re: Xtables2 Netlink spec

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jan,

I have trimmed the CC to netfilter, I don't think this deserves attention to users, not yet at least.

Some quick impressions on your proposal:

On 24/11/10 23:29, Jan Engelhardt wrote:

By request of Pablo, I am posting the Xtables2 Netlink interface
specification for review. Additionally, further documentation and
toolchain around it is available through the temporary project page at

	http://jengelh.medozas.de/projects/xtables/

which currently includes

  * User Documentation Chapter 1: Architectural Differences

  * Developer Documentation Part 1: Netlink interface (WIP)
    This is copied below to facilitate inline replies

  * Runnable Linux source tree

  * Runnable userspace library (libnetfilter_xtables)
    with small test-and-debug program

--8<--

Netlink interface

1 General use

1.1 Socket

Xtables2 is usable through a Netlink socket of type
NETLINK_XTABLES. No intermediate subsystem like nfnetlink is
used, because the kernel's nfnetlink parser does not make all
attributes available to (in-kernel) nfnetlink users.

#include<sys/socket.h>
#include<linux/netlink.h>
#define NETFILTER_XTABLES 21

nfxt_socket = socket(AF_NETLINK, SOCK_RAW, NETFILTER_XTABLES);

The NETLINK_XTABLES constant is defined in linux/netlink.h with
the value 21.

This has to go upon nfnetlink as other netfilter subsystems.

1.2 Message format

All messages transmitted over the Netlink socket are to have the
base struct nlmsghdr header, followed by a version tag to allow
for the flexibility of data following it:

struct xtnetlink_genhdr {
         uint32_t version;
};

The version member is always 0 in the current implementation.

Following the genhdr can be any number of standard Netlink
attributes (struct nlattr plus their payload).

Often, a logical tree structure is used to describe something,
such as for example tables of chains of rules:

filter
  \__ INPUT
  |    \__ some rule
  \__ FORWARD
  |    \__ rule2
  |    \__ rule3
  |    \__ rule4
  \__ OUTPUT
       \__ rule5
       \__ rule6

For this document, child objects are always ânestedâ within a
parent object, irrespective of the serialized encoding.

There are different ways to encode such a tree structure into a
serialized stream. In many Netlink protocols, children attributes
are encapsulated (a. k. a. ânestedâ, though we will avoid this
term to avoid double-use) and treated as a whole as a parent's
opaque data. We will call this format âEncapsulated Encodingâ.

To encode an attribute's length, struct nlattr only has a 16-bit
field, which means the attribute header plus payload is limited
to 64 KB. This is easily exceedable with the encapsulated
encoding as chains are collected rules in a chain, for example.
The problem is aggreviated by the kernel's Netlink handler only
allocating skbs a page size worth, which in the worst case means
that the usable payload for attributes is around 3600 bytes only.
In light of xt_u32's private data block being 1984 bytes already,
that means that you won't be able to fit two -m u32 invocations
nested in a single rule into a dump.
>
The Xtables2 Netlink protocol however encodes each node as a
standalone attribute, to be called Flat Encoding, that is
appended (a. k. a. âchainedâ) to the data stream. This makes it
possible to split requests and dumps at a finer level than
encapsulation would. Above all, it gets extensions the guarantee
to have data blocks of a minimum guaranteed size.
>
Since Netlink messages do have a 32-bit quantity to store the
message length, rulesets of roughly up to 4 GB are possibile,
which is currently regarded as sufficient. The largest (and
meaningful) rulesets seen to date in the industry weighed in at
approximately 150 MB.

You can split data into several messages and avoid this limitation.

Whereas attribute nesting automatically provided for boundaries,
this is realized using a dummy attribute in the chained approach.
Certain attributes can start such a flattened nesting, and
NFXTA_STOP terminates it.

I don't like this trailing attribute, see below.

2 Attributes

The meaning of attributes depends upon the nesting level in which
they appear. Their type however remains the same, such that a
single Netlink attribute validation policy object (struct
nla_policy) is sufficient.

A table of all known attributes:

+--------+-----------------+---------------+----------------+
| Value  | Mnemonic        |    C type     | NLA type       |
+--------+-----------------+---------------+----------------+
+--------+-----------------+---------------+----------------+
|   1    | NFXTA_STOP      |               | NLA_FLAG       |
+--------+-----------------+---------------+----------------+
|   2    | NFXTA_ERRNO     |     int       | NLA_U32        |
+--------+-----------------+---------------+----------------+
|   3    | NFXTA_NAME      |   char []     | NLA_NUL_STRING |
+--------+-----------------+---------------+----------------+
|   4    | NFXTA_CHAIN     |               | NLA_FLAG       |
+--------+-----------------+---------------+----------------+
|   5    | NFXTA_HOOKNUM   | unsigned int  | NLA_U32        |
+--------+-----------------+---------------+----------------+
|   6    | NFXTA_PRIORITY  |     int       | NLA_U32        |
+--------+-----------------+---------------+----------------+
|   7    | NFXTA_NFPROTO   |   uint8_t     | NLA_U32        |
+--------+-----------------+---------------+----------------+
|        | NFXTA_RULE      |               | NLA_FLAG       |
+--------+-----------------+---------------+----------------+
|        | NFXTA_OFFSET    | unsigned int  | NLA_U32        |
+--------+-----------------+---------------+----------------+
|        | NFXTA_LENGTH    |    size_t     | NLA_U32        |
+--------+-----------------+---------------+----------------+
|        | NFXTA_VERDICT   | unsigned int  | NLA_U32        |
+--------+-----------------+---------------+----------------+
|        | NFXTA_MATCH     |               | NLA_FLAG       |
+--------+-----------------+---------------+----------------+
|        | NFXTA_DATA      |               | NLA_BINARY     |
+--------+-----------------+---------------+----------------+
|        | NFXTA_TARGET    |               | NLA_FLAG       |
+--------+-----------------+---------------+----------------+
|        | NFXTA_JUMP      |   char []     | NLA_NUL_STRING |
+--------+-----------------+---------------+----------------+
|        | NFXTA_GOTO      |   char []     | NLA_NUL_STRING |
+--------+-----------------+---------------+----------------+
|        | NFXTA_REVISION  |   uint8_t     | NLA_U32        |
+--------+-----------------+---------------+----------------+
|        | NFXTA_SIZE      |    size_t     | NLA_U32        |
+--------+-----------------+---------------+----------------+
|        | NFXTA_HOOKMASK  | unsigned int  | NLA_U32        |
+--------+-----------------+---------------+----------------+


The kernel ignores attributes with value 0 during validation, so
it was left unused.

2.1 Nest level terminator<sub:nfxta_stop>

This attribute serves to denote the end of a nesting level as
introduced by NFXTA_CHAIN, NFXTA_RULE, NFXTA_MATCH or
NFXTA_TARGET. It has no data portion.

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = 4                   | nla_type = NFXTA_STOP         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

It's not a good idea to make assumptions on the order of the TLVs in a Netlink message. I mean, you should not assume that NFXTA_STOP comes after one specific attribute.

2.2 Dump error code<sub:nfxta_errno>

Once a NLM_F_MULTI dump operation has been started, for example
with the NFXTM_CHAIN_DUMP request, Netlink kernel users must
always end it successfully with NLMSG_DONE. To convey an error
during the dump, Xtables2 will emit a NFXTA_ERRNO attribute into
the stream (if it can), emit no further attributes for the
request, and cause the dump to stop.

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = 8                   | nla_type = NFXTA_ERRNO        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| int errno;                                                    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Isn't nlmsg_err OK for your needs?

2.3 Match extension<sub:nfxta_match>

Invocation of a match is represented using the NFXTA_MATCH
attribute which starts a nest level. A match attribute must
contain two attributes:

â NFXTA_NAME: the name of the target extension

â NFXTA_DATA: data private to this instance of the extension

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = 4                   | nla_type = NFXTA_MATCH        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = 4 + payload         | nla_type = NFXTA_NAME         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
.                                                               .
. name of the extension, e.g. "hashlimit"                       .
.                                                               .
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = 4 + payload         | nla_type = NFXTA_DATA         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
.                                                               .
. e.g. struct xt_hashlimit_info

This is fine during some transition period, but Netlink protocols must not encapsulate structures in the payload of their TLVs.
                                .
.                                                               .
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = 4                   | nla_type = NFXTA_STOP         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

2.4 Target extension<sub:nfxta_target>

Invocation of a match is represented using the NFXTA_TARGET
attribute which starts a nest level. A target attribute must
contain two attributes:

â NFXTA_NAME: the name of the target extension

â NFXTA_DATA: data private to this instance of the extension
0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = 4                   | nla_type = NFXTA_TARGET       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = 4 + payload         | nla_type = NFXTA_NAME         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
.                                                               .
. name of the extension, e.g. "TCPMSS"                          .
.                                                               .
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = 4 + payload         | nla_type = NFXTA_DATA         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
.                                                               .
. e.g. struct xt_tcpmss_info

same comment as above.
                                    .
.                                                               .
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = 4                   | nla_type = NFXTA_STOP         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

2.5 Rule<sub:nfxta_rule>

A rule is started using the NFXTA_RULE attribute, which starts a
nest level, and is ended with an NFXTA_STOP attribute. Rules can
contain:

â Zero or more match extensions (NFXTA_MATCH..NFXTA_STOP).

â Zero or more target extensions (NFXTA_TARGET..NFXTA_STOP).

â Zero or one NFXTA_VERDICT attribute that specifies the rule's
   verdict as data, which can either be NF_ACCEPT or NF_DROP.
   (Non-normative notes: The supplied verdict is executed if no
   target has reached a verdict on its own. Omission of the
   verdict attribute counts as XT_CONTINUE.)

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = 4                   | nla_type = NFXTA_RULE         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
.                                                               .
. matches, targets, verdict                                     .
.                                                               .
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = 4                   | nla_type = NFXTA_STOP         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

2.6 Chain<sub:nfxta_chain>

A chain is started using the NFXTA_CHAIN attribute, which starts
a nest level, and is ended with an NFXTA_STOP attribute. Chains
can contain:

â Zero or one of this group of three (= specify all three, or
   none at all), specifying that this chain is a base chain
   hooking in at some point:

   â One NFXTA_HOOKNUM attribute for giving a hook number. This is
     (unfortunately) dependent on the chosen nfproto, so it is
     either NF_INET_*, NF_BR_* or NF_ARP_*.

   â One NFXTA_PRIORITY attribute.

   â One NFXTA_NFPROTO attribute that is NFPROTO_*.

â Zero or more rules (NFXTA_RULE..NFXTA_STOP).

Example of a fully populated chain:

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = 4                   | nla_type = NFXTA_CHAIN        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = 8                   | nla_type = NFXTA_HOOKNUM      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| hook number (0..7)                                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = 8                   | nla_type = NFXTA_PRIORITY     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| priority (-2147483648..2147483647)                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = 8                   | nla_type = NFXTA_NFPROTO      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nfproto value (2=ipv4, 3=arp, 7=bridge, 10=ipv6, 12=decnet)   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
.                                                               .
. rules                                                         .
.                                                               .
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = 4                   | nla_type = NFXTA_STOP         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

3 Message types

3.1 IDENTIFYNFXTM_IDENTIFY: Identification

First and foremost a debug command. And to get something
(table/chain-independent) that users can glare at (they love
doing that).

Request:

â nlmsg_type = NFXTM_IDENTIFY;

Response:

â An NFXTA_NAMENFXTA_NAME attribute contains the name and version
   of the implementation/patchset.

â Zero or more attributes of type NFXTA_MATCH, terminated by
   NFXTA_STOP, giving meta information about the loaded match
   extensions. Per available match, a group of three attributes
   follows:

   â One NFXTA_NAME attribute for the name of the extension

   â One NFXTA_REVISION attribute to denote the version of the
     extension's parameter protocol

   â One NFXTA_SIZE attribute for the size of its per-instance
     data block

We can avoid this if structures are splitted into several TLVs. You can add new attributes and obsolete old ones.

â Zero or more attributes of type NFXTA_TARGET, terminated by
   NFXTA_STOP, giving meta information about the loaded and
   available target extensions:

   â same attributes as with NFXTA_MATCH above

3.2 CHAIN_NEWNFXTM_CHAIN_NEW: Create new chain

Request:

â nlmsg_type = NFXTM_CHAIN_NEW;

â NFXTA_NAME attribute carrying the name of the new chain.

â Zero or one of this group of three:

   â NFXTA_HOOKNUM

   â NFXTA_PRIORITY

   â NFXTA_NFPROTO

Response:

â Standard ACK.

Remarks:

Right now, a chain can only be promoted to a base chain during
creation (as far as the userspace view goes; when the kernel
exactly installs the nf_hook_ops is not of concern to userspace),
and it can only be demoted by deleting it. Should a
NFXTM_CHAIN_PROMOTE be split off the NFXTM_CHAIN_NEW
functionality?

3.3 CHAIN_DELNFXTM_CHAIN_DEL: Delete a chain

Request:

â nlmsg_type = NFXTM_CHAIN_DEL;

â NFXTA_NAME attribute carrying the name of the chain to delete

Response:

â Standard ACK.

3.4 CHAIN_MOVENFXTM_CHAIN_MOVE: Rename a chain

Request:

â nlmsg_type = NFXTM_CHAIN_MOVE;

â Two NFXTA_NAME attributes (order is important):

   â First one specifies the current name of the chain

   â Second one specifies the new name of the chain

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nlmsg_len = at least 24                                       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nlmsg_type = NFXTM_CHAIN_MOVE | nlmsg_flags = NLM_F_REQUEST   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nlmsg_seq = whatever                                          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nlmsg_pid = whatever                                          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = at least 4          | nla_type = NFXTA_NAME         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
.                                                               .
. old name                                                      .
.                                                               .
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = at least 4          | nla_type = NFXTA_NAME         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
.                                                               .
. new name                                                      .
.                                                               .
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

3.5 CHAIN_DUMPNFXTM_CHAIN_DUMP: Chain dump

Request:

â nlmsg_type = NFXTM_CHAIN_DUMP;

â NFXTA_NAMENFXTA_NAME attribute specifying the name of the chain
   to dump

Response:

â Zero or one of this group of three:

   â NFXTA_HOOKNUMNFXTA_HOOKNUM, NFXTA_PRIORITYNFXTA_PRIORITY,
     NFXTA_NFPROTONFXTA_NFPROTO.

â Zero or more NFXTA_RULE attributes as per section [sub:nfxta_rule]
   .

Errors:

â If an error occurs during dump, an NFXTA_ERRNO attribute is
   emitted into the stream and the dump will immediately terminate
   with a standard NLMSG_DONE message. No NFXTA_STOP attributes
   will be emitted if the dump stopped in the middle of a nesting
   level.

3.6 TABLE_DUMPNFXTM_TABLE_DUMP: Table dump

Returns an atomic snapshot of the table.

Request:

â nlmsg_type = NFXTM_TABLE_DUMP;

Response:

â Zero or more NFXTA_CHAINNFXTA_CHAIN attributes as described in
   section [sub:nfxta_chain].

3.7 CHAIN_SPLICENFXTM_CHAIN_SPLICE: Add/delete rules

The NFXTM_CHAIN_SPLICE request does a bulk deletion of zero or
more consecutive rules, followed by a bulk insertion of zero or
more consecutive rules, all done in an atomic fashion. It
operates similar to Perl's splice function on arrays. The request
message needs to have at least the first three attributes.

Request:

â NFXTA_NAMENFXTA_NAME: Name of the chain to modify.

â NFXTA_OFFSETNFXTA_OFFSET: Index of entry where operation should
   start.

â NFXTA_LENGTHNFXTA_LENGTH: Number of entries starting from
   offset that should be removed. May be zero or more.

â Zero or more NFXTA_RULENFXTA_RULE as per section [sub:nfxta_rule]
   .

Response:

â Standard ACK.

â Desired: detailed error code and origin of error (result of
   running ->check in extensions)

3.8 TABLE_REPLACENFXTM_TABLE_REPLACE

Atomic exchange of an entire table.

Request:

â nlmsg_type = NFXTM_TABLE_REPLACE;

â Zero or more NFXTA_CHAINNFXTA_CHAIN attributes as per section [sub:nfxta_chain]
   .

Response:

â Standard ACK.

â Desired: detailed error code and origin of error (result of
   running ->check in extensions)

That's all by now. Quite exhaustive, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Netfitler Users]     [LARTC]     [Bugtraq]     [Yosemite Forum]

  Powered by Linux