[Q/A][ICMP Unreach need-to-frag] Forwarded packets and IP fragmentation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I have question regarding design. The point is to handle ICMP Error
Unreach need-to-fragment.

Specifically and to share a bit more of context. The idea is to make
Cilium handling ICMP Error Unreach need-to-fragment with service
NodePort.

I understand that Cilium is not Linux but for that particular case we
are in the middle of both.

Initially the question that I had was, does Linux do packet
fragmentation from a host that is forwarding traffic. That when it has
a route for that traffic which indicates a smaller MTU than the
traffic comming from. Will the traffic be fragmented during egressing?

But then I was considering to discuss what I have experienced and
ideas since I may be wrong on my way to implement it.

Also that I have noticed the option `ip_forward_use_pmtu` But probably
not for thise case, I have enabled it but no luck.


Pod-X      : 172.10.0.10
NodePort-X : 192.168.39.23
Router-X   : 192.168.39.1
Client-X   : 10.1.0.100


                                  +------------+
                                  | Pod-X      |
                                  +------+-----+
   Cilium Host-Y                         |
            ------+-------------+--------+-------------------
                  ||
                  || VxLan
                  ||
                  ||               +------------+
                  ||               | NodePort-X |  192.168.39.0/24 dev eth0
                  ||               +------+-----+
   Cilium Host-X  ||                      |
            ------++------------+---------+-------------------
                                |
   World                        |
   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                   |  ^
                   |  | ICMP Error Router-Y to NodePort
                   |  |   192.168.39.1/192.168.39.23
                   |  |      with in Payload 192.168.39.23/10.1.0.100
            +-------------+
            | Router-X    |
            +-------------+
                   |
   ----------------+-------+--------
                           |
                         Client-X


Routes
------
10.1.0.100 via 192.168.39.1 MTU 800


For a given Pod behind a service Nodeport delivery contents that is
exceeding MTU of one of the networking equipment in the path between
cluster and client. In that situation the networking equipment will
return to Cluster (NodePort) an ICMP Error Unreach need-to-fragment.

* Forwarding the packet to the Pod would not work since the Pod only
  has a view of the path between the node that is hosting the service
  NodePort and the backend node that is hosting the Pod, and we don’t
  want to reduce the MTU for that path.

Saying all of that I’m struggling to find the right approach.

I have experimented some:

1/ Having the host that is hosting the service Nodeport handling (as
   opposite to forward it to Pod) the ICMP Error message (currently
   dropped). The expected state would be to have the route table of
   the host that is hosting service NodePort be updated accordingly
   the ICMP Error.  But that does not look possible at that point in
   Linux since there are some checks to validate that the ICMP Error
   has been received for a response of a packet emit [0]. In context
   of Cilium we bypass netfilter during egressing, right?

	sk = __inet_lookup_established(net, net->ipv4.tcp_death_row.hashinfo,
				       iph->daddr, th->dest, iph->saddr,
				       ntohs(th->source), inet_iif(skb), 0);
	if (!sk) {
		__ICMP_INC_STATS(net, ICMP_MIB_INERRORS);
		return -ENOENT;
	}

[0] https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/net/ipv4/tcp_ipv4.c#n487


2/ Having service NodePort itself updating the route table of the host
   to instruct the new route with MTU based on the ICMP Error Unreach
   need-to-frag.  In that situation It may be expect that the packets
   get fragmented by the host during egressing. But based on my tests
   that does not look to work, I'm nore sure if Linux handle that case
   of forwarding/fragmenting?

3/ Having service NodePort handling the full implementation of ICMP
   Error Unreach need-to-fragement;
   - For a ICMP Error received the service would maintaining a MAP
     with routes and MTU.
   - For a packet leaving, the service would fragment packets if
     needed.


s.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux