Re: kernel packet traveling diagram

Linux Advanced Routing and Traffic Control

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Stef:

You wrote:

 > Where belongs the IMQ device ?  For ingress, it registers the needed
 > netfilter hooks right after the mangle table.  For egress, it registers the
 > needed netfilter hooks after all other tables so after POSTROUTING in the
 > diagram.
 >
 > I think the packet is also redirected to the imq device at the same place.
 > But I\'m not sure.

I don't know too but...

 From "The intermediate queueing device" by Patrick McHardy:

-------------------------
The Intermediate queueing device can be used for advanced traffic control.

You can use it to implement egress + ingress traffic control, possibly over
multiple network devices. All packets entering/leaving the ipstack marked
with an special iptables target will be directed through the qdisc attached
to an imq device. After enqueueing the decision what happens to a packet is
up to the qdisc. It can reorder/drop packets according to local policies.
This allows you to treat network devices as classes and distribute bandwidth
among them as well as doing real ingress traffic control using egress qdiscs.
-------------------------

The ipstack Patrick is talking about is after input mangle.

Reading from "The journey of a packet through the linux 2.4 network stack"
by Harald Welte we have:

----------------------------
The IP packet handler is registered via net/core/dev.c:dev_add_pack() called
from net/ipv4/ip_output.c:ip_init().

The IPv4 packet handling function is net/ipv4/ip_input.c:ip_rcv(). After some
initial checks (if the packet is for this host, ...) the ip checksum is
calculated. Additional checks are done on the length and IP protocol
version 4.

Every packet failing one of the sanity checks is dropped at this point.

If the packet passes the tests, we determine the size of the ip packet and
trim the skb in case the transport medium has appended some padding.

Now it is the first time one of the netfilter hooks is called.

Netfilter provides a generic and abstract interface to the standard routing
code. This is currently used for packet filtering, mangling, NAT and queuing
packets to userspace. For further reference see my conference paper 'The
netfilter subsystem in Linux 2.4' or one of Rustys unreliable guides, i.e
the netfilter-hacking-guide.
-------------------------------

The ipstack Patrick uses must be what Harald called (after first group of
netfilter hooks) "queueing packets to userspace".

I suppose IMQ is an iptables target extension like QUEUE just before ingress
queueing. Packets are marked in PREROUTING mangle and taken from the ipstack
to enter the dummy device and "on exit" they are polycing using some of the
queue disciplines.

                                +-------+------+
                                |      nat     |
                                |  PREROUTING  | <- DEST REWRITE
                                +-------+------+
                                        |
                                +-------+------+
                                |   ipchains   |
                                |    FILTER    |
                                +-------+------+
                                        |

                            is IMQ probably here ??

                                        |
                                +-------+------+
                                |     QOS      |
                                |   INGRESS    | <- controlled by tc
                                +-------+------+
                                        |
                 packet is for  +-------+------+ packet is for
                 this address   |     INPUT    | another address
                 +--------------+    ROUTING   +---------------+
                 |              |    + PRDB    |               |
                 |              +--------------+               |


If we keep on reading, we have:
----------------------------------------------
After successful traversal the netfilter hook,
net/ipv4/ipv_input.c:ip_rcv_finish() is called.

Inside ip_rcv_finish(), the packet's destination is determined by calling the
routing function net/ipv4/route.c:ip_route_input(). Furthermore, if our IP
packet has IP options, they are processed now. Depending on the routing
decision made by net/ipv4/route.c:ip_route_input_slow(), the journey of our
packet continues in one of the following functions:

net/ipv4/ip_input.c:ip_local_deliver()

The packet's destination is local, we have to process the layer 4 protocol
and pass it to an userspace process.

net/ipv4/ip_forward.c:ip_forward()

The packet's destination is not local, we have to forward it to another
network.

net/ipv4/route.c:ip_error()

An error occurred, we are unable to find an apropriate routing table entry
for this packet.

net/ipv4/ipmr.c:ip_mr_input()

It is a Multicast packet and we have to do some multicast routing.

If the routing decided that this packet has to be forwarded to another device,
the function net/ipv4/ip_forward.c:ip_forward() is called.

The first task of this function is to check the ip header's TTL. If it
is <= 1 we drop the packet and return an ICMP time exceeded message to the
sender.

We check the header's tailroom if we have enough tailroom for the destination
device's link layer header and expand the skb if neccessary.

Next the TTL is decremented by one.

If our new packet is bigger than the MTU of the destination device and the
don't fragment bit in the IP header is set, we drop the packet and send a
ICMP frag needed message to the sender.

Finally it is time to call another one of the netfilter hooks - this time it
is the NF_IP_FORWARD hook.

Assuming that the netfilter hooks is returning a NF_ACCEPT verdict, the
function net/ipv4/ip_forward.c:ip_forward_finish() is the next step in our
packet's journey.

ip_forward_finish() itself checks if we need to set any additional options in
the IP header, and has ip_opt *FIXME* doing this. Afterwards it calls
include/net/ip.h:ip_send().

If we need some fragmentation, *FIXME*:ip_fragment gets called, otherwise we
continue in net/ipv4/ip_forward:ip_finish_output().

ip_finish_output() again does nothing else than calling the netfilter
postrouting hook NF_IP_POST_ROUTING and calling ip_finish_output2() on
successful traversal of this hook.

ip_finish_output2() calls prepends the hardware (link layer) header to our
skb and calls net/ipv4/ip_output.c:ip_output().
---------------------

*FIXME* are actually placed in Harald document.

Ok, as I understand the second IMQ hook must be after the netfilter
postrouting hook NF_IP_POST_ROUTING but before calling the link layer
function ip_output in ip_output.c.

                                       |
                               +-------+------+
                               |     nat      |
                               | POSTROUTING  | SOURCE REWRITE
                               +-------+------+
                                       |

                            is IMQ probably here ??

                                       |
                               +-------+------+
                               |     QOS      |
                               |    EGRESS    | <- controlled by tc
                               +-------+------+
                                       |
                            -----------+-----------
                                    Network

I'm not sure again. Perhaps if Patrick is reading this can help a little.

Best regards,

Leonardo Balliache

PS: thank a lot for uploading the diagram in your site.


_______________________________________________
LARTC mailing list / LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/

[Index of Archives]     [LARTC Home Page]     [Netfilter]     [Netfilter Development]     [Network Development]     [Bugtraq]     [GCC Help]     [Yosemite News]     [Linux Kernel]     [Fedora Users]
  Powered by Linux