Xtables2 Netlink spec

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



By request of Pablo, I am posting the Xtables2 Netlink interface 
specification for review. Additionally, further documentation and 
toolchain around it is available through the temporary project page at

	http://jengelh.medozas.de/projects/xtables/

which currently includes

 * User Documentation Chapter 1: Architectural Differences

 * Developer Documentation Part 1: Netlink interface (WIP)
   This is copied below to facilitate inline replies

 * Runnable Linux source tree

 * Runnable userspace library (libnetfilter_xtables)
   with small test-and-debug program

--8<--

Netlink interface

1 General use

1.1 Socket

Xtables2 is usable through a Netlink socket of type 
NETLINK_XTABLES. No intermediate subsystem like nfnetlink is 
used, because the kernel's nfnetlink parser does not make all 
attributes available to (in-kernel) nfnetlink users.

#include <sys/socket.h>
#include <linux/netlink.h>
#define NETFILTER_XTABLES 21

nfxt_socket = socket(AF_NETLINK, SOCK_RAW, NETFILTER_XTABLES);

The NETLINK_XTABLES constant is defined in linux/netlink.h with 
the value 21.

1.2 Message format

All messages transmitted over the Netlink socket are to have the 
base struct nlmsghdr header, followed by a version tag to allow 
for the flexibility of data following it:

struct xtnetlink_genhdr {
        uint32_t version;
};

The version member is always 0 in the current implementation.

Following the genhdr can be any number of standard Netlink 
attributes (struct nlattr plus their payload).

Often, a logical tree structure is used to describe something, 
such as for example tables of chains of rules:

filter
 \__ INPUT
 |    \__ some rule
 \__ FORWARD
 |    \__ rule2
 |    \__ rule3
 |    \__ rule4
 \__ OUTPUT
      \__ rule5
      \__ rule6

For this document, child objects are always ânestedâ within a 
parent object, irrespective of the serialized encoding.

There are different ways to encode such a tree structure into a 
serialized stream. In many Netlink protocols, children attributes 
are encapsulated (a. k. a. ânestedâ, though we will avoid this 
term to avoid double-use) and treated as a whole as a parent's 
opaque data. We will call this format âEncapsulated Encodingâ.

To encode an attribute's length, struct nlattr only has a 16-bit 
field, which means the attribute header plus payload is limited 
to 64 KB. This is easily exceedable with the encapsulated 
encoding as chains are collected rules in a chain, for example. 
The problem is aggreviated by the kernel's Netlink handler only 
allocating skbs a page size worth, which in the worst case means 
that the usable payload for attributes is around 3600 bytes only. 
In light of xt_u32's private data block being 1984 bytes already, 
that means that you won't be able to fit two -m u32 invocations 
nested in a single rule into a dump.

The Xtables2 Netlink protocol however encodes each node as a 
standalone attribute, to be called Flat Encoding, that is 
appended (a. k. a. âchainedâ) to the data stream. This makes it 
possible to split requests and dumps at a finer level than 
encapsulation would. Above all, it gets extensions the guarantee 
to have data blocks of a minimum guaranteed size.

Since Netlink messages do have a 32-bit quantity to store the 
message length, rulesets of roughly up to 4 GB are possibile, 
which is currently regarded as sufficient. The largest (and 
meaningful) rulesets seen to date in the industry weighed in at 
approximately 150 MB.

Whereas attribute nesting automatically provided for boundaries, 
this is realized using a dummy attribute in the chained approach. 
Certain attributes can start such a flattened nesting, and 
NFXTA_STOP terminates it.

2 Attributes

The meaning of attributes depends upon the nesting level in which 
they appear. Their type however remains the same, such that a 
single Netlink attribute validation policy object (struct 
nla_policy) is sufficient.

A table of all known attributes:


+--------+-----------------+---------------+----------------+
| Value  | Mnemonic        |    C type     | NLA type       |
+--------+-----------------+---------------+----------------+
+--------+-----------------+---------------+----------------+
|   1    | NFXTA_STOP      |               | NLA_FLAG       |
+--------+-----------------+---------------+----------------+
|   2    | NFXTA_ERRNO     |     int       | NLA_U32        |
+--------+-----------------+---------------+----------------+
|   3    | NFXTA_NAME      |   char []     | NLA_NUL_STRING |
+--------+-----------------+---------------+----------------+
|   4    | NFXTA_CHAIN     |               | NLA_FLAG       |
+--------+-----------------+---------------+----------------+
|   5    | NFXTA_HOOKNUM   | unsigned int  | NLA_U32        |
+--------+-----------------+---------------+----------------+
|   6    | NFXTA_PRIORITY  |     int       | NLA_U32        |
+--------+-----------------+---------------+----------------+
|   7    | NFXTA_NFPROTO   |   uint8_t     | NLA_U32        |
+--------+-----------------+---------------+----------------+
|        | NFXTA_RULE      |               | NLA_FLAG       |
+--------+-----------------+---------------+----------------+
|        | NFXTA_OFFSET    | unsigned int  | NLA_U32        |
+--------+-----------------+---------------+----------------+
|        | NFXTA_LENGTH    |    size_t     | NLA_U32        |
+--------+-----------------+---------------+----------------+
|        | NFXTA_VERDICT   | unsigned int  | NLA_U32        |
+--------+-----------------+---------------+----------------+
|        | NFXTA_MATCH     |               | NLA_FLAG       |
+--------+-----------------+---------------+----------------+
|        | NFXTA_DATA      |               | NLA_BINARY     |
+--------+-----------------+---------------+----------------+
|        | NFXTA_TARGET    |               | NLA_FLAG       |
+--------+-----------------+---------------+----------------+
|        | NFXTA_JUMP      |   char []     | NLA_NUL_STRING |
+--------+-----------------+---------------+----------------+
|        | NFXTA_GOTO      |   char []     | NLA_NUL_STRING |
+--------+-----------------+---------------+----------------+
|        | NFXTA_REVISION  |   uint8_t     | NLA_U32        |
+--------+-----------------+---------------+----------------+
|        | NFXTA_SIZE      |    size_t     | NLA_U32        |
+--------+-----------------+---------------+----------------+
|        | NFXTA_HOOKMASK  | unsigned int  | NLA_U32        |
+--------+-----------------+---------------+----------------+


The kernel ignores attributes with value 0 during validation, so 
it was left unused.

2.1 Nest level terminator<sub:nfxta_stop>

This attribute serves to denote the end of a nesting level as 
introduced by NFXTA_CHAIN, NFXTA_RULE, NFXTA_MATCH or 
NFXTA_TARGET. It has no data portion.

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = 4                   | nla_type = NFXTA_STOP         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

2.2 Dump error code<sub:nfxta_errno>

Once a NLM_F_MULTI dump operation has been started, for example 
with the NFXTM_CHAIN_DUMP request, Netlink kernel users must 
always end it successfully with NLMSG_DONE. To convey an error 
during the dump, Xtables2 will emit a NFXTA_ERRNO attribute into 
the stream (if it can), emit no further attributes for the 
request, and cause the dump to stop.

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = 8                   | nla_type = NFXTA_ERRNO        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| int errno;                                                    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

2.3 Match extension<sub:nfxta_match>

Invocation of a match is represented using the NFXTA_MATCH 
attribute which starts a nest level. A match attribute must 
contain two attributes:

â NFXTA_NAME: the name of the target extension

â NFXTA_DATA: data private to this instance of the extension

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = 4                   | nla_type = NFXTA_MATCH        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = 4 + payload         | nla_type = NFXTA_NAME         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
.                                                               .
. name of the extension, e.g. "hashlimit"                       .
.                                                               .
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = 4 + payload         | nla_type = NFXTA_DATA         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
.                                                               .
. e.g. struct xt_hashlimit_info                                 .
.                                                               .
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = 4                   | nla_type = NFXTA_STOP         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

2.4 Target extension<sub:nfxta_target>

Invocation of a match is represented using the NFXTA_TARGET 
attribute which starts a nest level. A target attribute must 
contain two attributes:

â NFXTA_NAME: the name of the target extension

â NFXTA_DATA: data private to this instance of the extension

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = 4                   | nla_type = NFXTA_TARGET       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = 4 + payload         | nla_type = NFXTA_NAME         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
.                                                               .
. name of the extension, e.g. "TCPMSS"                          .
.                                                               .
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = 4 + payload         | nla_type = NFXTA_DATA         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
.                                                               .
. e.g. struct xt_tcpmss_info                                    .
.                                                               .
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = 4                   | nla_type = NFXTA_STOP         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

2.5 Rule<sub:nfxta_rule>

A rule is started using the NFXTA_RULE attribute, which starts a 
nest level, and is ended with an NFXTA_STOP attribute. Rules can 
contain:

â Zero or more match extensions (NFXTA_MATCH..NFXTA_STOP).

â Zero or more target extensions (NFXTA_TARGET..NFXTA_STOP).

â Zero or one NFXTA_VERDICT attribute that specifies the rule's 
  verdict as data, which can either be NF_ACCEPT or NF_DROP. 
  (Non-normative notes: The supplied verdict is executed if no 
  target has reached a verdict on its own. Omission of the 
  verdict attribute counts as XT_CONTINUE.)

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = 4                   | nla_type = NFXTA_RULE         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
.                                                               .
. matches, targets, verdict                                     .
.                                                               .
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = 4                   | nla_type = NFXTA_STOP         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

2.6 Chain<sub:nfxta_chain>

A chain is started using the NFXTA_CHAIN attribute, which starts 
a nest level, and is ended with an NFXTA_STOP attribute. Chains 
can contain:

â Zero or one of this group of three (= specify all three, or 
  none at all), specifying that this chain is a base chain 
  hooking in at some point:

  â One NFXTA_HOOKNUM attribute for giving a hook number. This is 
    (unfortunately) dependent on the chosen nfproto, so it is 
    either NF_INET_*, NF_BR_* or NF_ARP_*.

  â One NFXTA_PRIORITY attribute.

  â One NFXTA_NFPROTO attribute that is NFPROTO_*.

â Zero or more rules (NFXTA_RULE..NFXTA_STOP).

Example of a fully populated chain:

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = 4                   | nla_type = NFXTA_CHAIN        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = 8                   | nla_type = NFXTA_HOOKNUM      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| hook number (0..7)                                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = 8                   | nla_type = NFXTA_PRIORITY     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| priority (-2147483648..2147483647)                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = 8                   | nla_type = NFXTA_NFPROTO      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nfproto value (2=ipv4, 3=arp, 7=bridge, 10=ipv6, 12=decnet)   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
.                                                               .
. rules                                                         .
.                                                               .
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = 4                   | nla_type = NFXTA_STOP         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

3 Message types

3.1 IDENTIFYNFXTM_IDENTIFY: Identification

First and foremost a debug command. And to get something 
(table/chain-independent) that users can glare at (they love 
doing that).

Request:

â nlmsg_type = NFXTM_IDENTIFY;

Response:

â An NFXTA_NAMENFXTA_NAME attribute contains the name and version 
  of the implementation/patchset.

â Zero or more attributes of type NFXTA_MATCH, terminated by 
  NFXTA_STOP, giving meta information about the loaded match 
  extensions. Per available match, a group of three attributes 
  follows:

  â One NFXTA_NAME attribute for the name of the extension

  â One NFXTA_REVISION attribute to denote the version of the 
    extension's parameter protocol

  â One NFXTA_SIZE attribute for the size of its per-instance 
    data block

â Zero or more attributes of type NFXTA_TARGET, terminated by 
  NFXTA_STOP, giving meta information about the loaded and 
  available target extensions:

  â same attributes as with NFXTA_MATCH above

3.2 CHAIN_NEWNFXTM_CHAIN_NEW: Create new chain

Request:

â nlmsg_type = NFXTM_CHAIN_NEW;

â NFXTA_NAME attribute carrying the name of the new chain.

â Zero or one of this group of three:

  â NFXTA_HOOKNUM

  â NFXTA_PRIORITY

  â NFXTA_NFPROTO

Response:

â Standard ACK.

Remarks:

Right now, a chain can only be promoted to a base chain during 
creation (as far as the userspace view goes; when the kernel 
exactly installs the nf_hook_ops is not of concern to userspace), 
and it can only be demoted by deleting it. Should a 
NFXTM_CHAIN_PROMOTE be split off the NFXTM_CHAIN_NEW 
functionality?

3.3 CHAIN_DELNFXTM_CHAIN_DEL: Delete a chain

Request:

â nlmsg_type = NFXTM_CHAIN_DEL;

â NFXTA_NAME attribute carrying the name of the chain to delete

Response:

â Standard ACK.

3.4 CHAIN_MOVENFXTM_CHAIN_MOVE: Rename a chain

Request:

â nlmsg_type = NFXTM_CHAIN_MOVE;

â Two NFXTA_NAME attributes (order is important):

  â First one specifies the current name of the chain

  â Second one specifies the new name of the chain

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nlmsg_len = at least 24                                       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nlmsg_type = NFXTM_CHAIN_MOVE | nlmsg_flags = NLM_F_REQUEST   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nlmsg_seq = whatever                                          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nlmsg_pid = whatever                                          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = at least 4          | nla_type = NFXTA_NAME         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
.                                                               .
. old name                                                      .
.                                                               .
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| nla_len = at least 4          | nla_type = NFXTA_NAME         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
.                                                               .
. new name                                                      .
.                                                               .
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

3.5 CHAIN_DUMPNFXTM_CHAIN_DUMP: Chain dump

Request:

â nlmsg_type = NFXTM_CHAIN_DUMP;

â NFXTA_NAMENFXTA_NAME attribute specifying the name of the chain 
  to dump

Response:

â Zero or one of this group of three:

  â NFXTA_HOOKNUMNFXTA_HOOKNUM, NFXTA_PRIORITYNFXTA_PRIORITY, 
    NFXTA_NFPROTONFXTA_NFPROTO.

â Zero or more NFXTA_RULE attributes as per section [sub:nfxta_rule]
  .

Errors:

â If an error occurs during dump, an NFXTA_ERRNO attribute is 
  emitted into the stream and the dump will immediately terminate 
  with a standard NLMSG_DONE message. No NFXTA_STOP attributes 
  will be emitted if the dump stopped in the middle of a nesting 
  level.

3.6 TABLE_DUMPNFXTM_TABLE_DUMP: Table dump

Returns an atomic snapshot of the table.

Request:

â nlmsg_type = NFXTM_TABLE_DUMP;

Response:

â Zero or more NFXTA_CHAINNFXTA_CHAIN attributes as described in 
  section [sub:nfxta_chain].

3.7 CHAIN_SPLICENFXTM_CHAIN_SPLICE: Add/delete rules

The NFXTM_CHAIN_SPLICE request does a bulk deletion of zero or 
more consecutive rules, followed by a bulk insertion of zero or 
more consecutive rules, all done in an atomic fashion. It 
operates similar to Perl's splice function on arrays. The request 
message needs to have at least the first three attributes.

Request:

â NFXTA_NAMENFXTA_NAME: Name of the chain to modify.

â NFXTA_OFFSETNFXTA_OFFSET: Index of entry where operation should 
  start.

â NFXTA_LENGTHNFXTA_LENGTH: Number of entries starting from 
  offset that should be removed. May be zero or more.

â Zero or more NFXTA_RULENFXTA_RULE as per section [sub:nfxta_rule]
  .

Response:

â Standard ACK.

â Desired: detailed error code and origin of error (result of 
  running ->check in extensions)

3.8 TABLE_REPLACENFXTM_TABLE_REPLACE

Atomic exchange of an entire table.

Request:

â nlmsg_type = NFXTM_TABLE_REPLACE;

â Zero or more NFXTA_CHAINNFXTA_CHAIN attributes as per section [sub:nfxta_chain]
  .

Response:

â Standard ACK.

â Desired: detailed error code and origin of error (result of 
  running ->check in extensions)
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Netfitler Users]     [LARTC]     [Bugtraq]     [Yosemite Forum]

  Powered by Linux