On 20/07/2020 09:15, Alexander Petrovsky wrote: > But, the main problem for us it's fragmented IP packets. Some times > ago I tried to use for such packets AF_XDP, fast pass them into the > user space, accumulate and after that pass back to the network, it was > a PoC. Not 100% sure this works because I haven't tried it, but as long as packets aren't being re-ordered, you can do it without needing to save the payload in a map. All the map needs to store is (for each IPID being tracked) what host this connection goes to. If you receive a First Fragment (frag_off=0, MF=1), you look up the tuple through the regular LB to pick a server, and record that host in the map entry for the IPID. For any other fragment, you look up the IPID in the map to get the destination host, and if MF=0 you delete the map entry. (If the IPID wasn't found, either drop or punt to userspace.) Then TX/REDIRECT the packet to the appropriate host. You might want to add some kind of simple ageing to this so that map entries from interrupted/spurious fragment chains don't stick around and build up over time. The problem comes when 'middle' fragments can either come after the last (MF=0) fragment (technically this can be handled by tracking the byte range seen for the IPID, and not deleting from the map until all bytes up to the frag_off+total_len of the last-frag have been seen), or worse, before the first fragment. If the frag_off=0 fragment isn't the first one received, then this doesn't work because you don't know at the time of receiving fragments what L4 ports they belong to. But I don't know how common that situation is and whether having it take the slow-path is acceptable. HTH, -ed