Hi, I have question regarding design. The point is to handle ICMP Error Unreach need-to-fragment. Specifically and to share a bit more of context. The idea is to make Cilium handling ICMP Error Unreach need-to-fragment with service NodePort. I understand that Cilium is not Linux but for that particular case we are in the middle of both. Initially the question that I had was, does Linux do packet fragmentation from a host that is forwarding traffic. That when it has a route for that traffic which indicates a smaller MTU than the traffic comming from. Will the traffic be fragmented during egressing? But then I was considering to discuss what I have experienced and ideas since I may be wrong on my way to implement it. Also that I have noticed the option `ip_forward_use_pmtu` But probably not for thise case, I have enabled it but no luck. Pod-X : 172.10.0.10 NodePort-X : 192.168.39.23 Router-X : 192.168.39.1 Client-X : 10.1.0.100 +------------+ | Pod-X | +------+-----+ Cilium Host-Y | ------+-------------+--------+------------------- || || VxLan || || +------------+ || | NodePort-X | 192.168.39.0/24 dev eth0 || +------+-----+ Cilium Host-X || | ------++------------+---------+------------------- | World | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ^ | | ICMP Error Router-Y to NodePort | | 192.168.39.1/192.168.39.23 | | with in Payload 192.168.39.23/10.1.0.100 +-------------+ | Router-X | +-------------+ | ----------------+-------+-------- | Client-X Routes ------ 10.1.0.100 via 192.168.39.1 MTU 800 For a given Pod behind a service Nodeport delivery contents that is exceeding MTU of one of the networking equipment in the path between cluster and client. In that situation the networking equipment will return to Cluster (NodePort) an ICMP Error Unreach need-to-fragment. * Forwarding the packet to the Pod would not work since the Pod only has a view of the path between the node that is hosting the service NodePort and the backend node that is hosting the Pod, and we don’t want to reduce the MTU for that path. Saying all of that I’m struggling to find the right approach. I have experimented some: 1/ Having the host that is hosting the service Nodeport handling (as opposite to forward it to Pod) the ICMP Error message (currently dropped). The expected state would be to have the route table of the host that is hosting service NodePort be updated accordingly the ICMP Error. But that does not look possible at that point in Linux since there are some checks to validate that the ICMP Error has been received for a response of a packet emit [0]. In context of Cilium we bypass netfilter during egressing, right? sk = __inet_lookup_established(net, net->ipv4.tcp_death_row.hashinfo, iph->daddr, th->dest, iph->saddr, ntohs(th->source), inet_iif(skb), 0); if (!sk) { __ICMP_INC_STATS(net, ICMP_MIB_INERRORS); return -ENOENT; } [0] https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/net/ipv4/tcp_ipv4.c#n487 2/ Having service NodePort itself updating the route table of the host to instruct the new route with MTU based on the ICMP Error Unreach need-to-frag. In that situation It may be expect that the packets get fragmented by the host during egressing. But based on my tests that does not look to work, I'm nore sure if Linux handle that case of forwarding/fragmenting? 3/ Having service NodePort handling the full implementation of ICMP Error Unreach need-to-fragement; - For a ICMP Error received the service would maintaining a MAP with routes and MTU. - For a packet leaving, the service would fragment packets if needed. s.