Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> writes: > On Fri, Jan 7, 2022 at 1:54 PM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: >> >> Because the data pages are recycled by the page pool, and the test runner >> doesn't re-initialise them for each run, subsequent invocations of the XDP >> program will see the packet data in the state it was after the last time it >> ran on that particular page. This means that an XDP program that modifies >> the packet before redirecting it has to be careful about which assumptions >> it makes about the packet content, but that is only an issue for the most >> naively written programs. > > This is too vague and partially incorrect. > The bpf program can do bpf_xdp_adjust_meta() and otherwise change > packet boundaries. These effects will be seen by subsequent > XDP_PASS/TX/REDIRECT, but on the next iteration the boundaries > will get reset to the original values. > So the test runner actually re-initializes some parts of the data, > but not the contents of the packet. > At least that's my understanding of the patch. Yes, that's correct. Boundaries will be reset, data won't. The boundary reset was added later, though, so guess I neglected to update the commit message. Will fix. > The users shouldn't need to dig into implementation to discover this. > Please document it. > The more I think about it the more I believe that it warrants > a little blurb in Documentation/bpf/ that describes what one can > do with this "xdp live mode". Sure, can do. Doesn't look like BPF_PROG_RUN is documented in there at all, so guess I can start such a document :) > Another question comes to mind: > What happens when a program modifies the packet? > Does it mean that the 2nd frame will see the modified data? > It will not, right? > It's the page pool size of packets that will be inited the same way > at the beginning. Which is NAPI_POLL_WEIGHT * 2 == 128 packets. > Why this number? Yes, you're right: the next run won't see the modified packet data. The 128 pages is because we run the program loop in batches of 64 (like NAPI does, the fact that TEST_XDP_BATCH and NAPI_POLL_WEIGHT are the same is not a coincidence). We need 2x because we want enough pages so we can keep running without allocating more, and the first batch can still be in flight on a different CPU while we're processing batch 2. I experimented with different values, and 128 was the minimum size that didn't have a significant negative impact on performance, and above that saw diminishing returns. > Should it be configurable? > Then the user can say: init N packets with this one pattern > and the program will know that exactly N invocation will be > with the same data, but N+1 it will see the 1st packet again > that potentially was modified by the program. > Is it accurate? I thought about making it configurable, but the trouble is that it's not quite as straight-forward as the first N packets being "pristine": it depends on what happens to the packet afterwards: On XDP_DROP, the page will be recycled immediately, whereas on XDP_{TX,REDIRECT} it will go through the egress driver after sitting in the bulk queue for a little while, so you can get reordering compared to the original execution order. On XDP_PASS the kernel will release the page entirely from the pool when building an skb, so you'll never see that particular page again (and eventually page_pool will allocate a new batch that will be re-initialised to the original value). If we do want to support a "pristine data" mode, I think the least cumbersome way would be to add a flag that would make the kernel re-initialise the packet data before every program invocation. The reason I didn't do this was because I didn't have a use case for it. The traffic generator use case only rewrites a tiny bit of the packet header, and it's just as easy to just keep rewriting it without assuming a particular previous value. And there's also the possibility of just calling bpf_prog_run() multiple times from userspace with a lower number of repetitions... I'm not opposed to adding such a flag if you think it would be useful, though. WDYT? -Toke