I do have a working eBPF program (handcrafted assembler) that has about 600 instructions. This programs takes more than 2 CPU seconds to load. In short, the eBPF program selects and redirects packets, does MSS clamping and sends ICMPs where required for IPv4 and IPv6. The eBPF program is part of a project that will be GPLed when sufficiently ready. I am willing to cobble something testable together and post it (attachment only) or send it directly, if somebody on this list is willing to investigate, why the verifier is having lots of CPU for breakfast. Please let me know if and how to proceed.