Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> writes: > On Sat, Jan 08, 2022 at 09:19:41PM +0100, Toke Høiland-Jørgensen wrote: >> >> Sure, totally fine with documenting it. Just seems to me the most >> obvious place to put this is in a new >> Documentation/bpf/prog_test_run.rst file with a short introduction about >> the general BPF_PROG_RUN mechanism, and then a subsection dedicated to >> this facility. > > sgtm Great! >> > I guess it's ok-ish to get stuck with 128. >> > It will be uapi that we cannot change though. >> > Are you comfortable with that? >> >> UAPI in what sense? I'm thinking of documenting it like: >> >> "The packet data being supplied as data_in to BPF_PROG_RUN will be used >> for the initial run of the XDP program. However, when running the >> program multiple times (with repeat > 1), only the packet *bounds* >> (i.e., the data, data_end and data_meta pointers) will be reset on each >> invocation, the packet data itself won't be rewritten. The pages >> backing the packets are recycled, but the order depends on the path the >> packet takes through the kernel, making it hard to predict when a >> particular modified page makes it back to the XDP program. In practice, >> this means that if the XDP program modifies the packet payload before >> sending out the packet, it has to be prepared to deal with subsequent >> invocations seeing either the initial data or the already-modified >> packet, in arbitrary order." >> >> I don't think this makes any promises about any particular size of the >> page pool, so how does it constitute UAPI? > > Could you explain out-of-order scanario again? > It's possible only if xdp_redirect is done into different netdevs. > Then they can xmit at different times and cycle pages back into > the loop in different order. But TX or REDIRECT into the same netdev > will keep the pages in the same order. So the program can rely on > that. I left that out on purpose: I feel it's exposing an internal implementation detail as UAPI (as you said). And I'm not convinced it really needed (or helpful) - see below. >> > >> > reinit doesn't feel necessary. >> > How one would use this interface to send N different packets? >> > The api provides an interface for only one. >> >> By having the XDP program react appropriately. E.g., here is the XDP >> program used by the trafficgen tool to cycle through UDP ports when >> sending out the packets - it just reads the current value and updates >> based on that, so it doesn't matter if it sees the initial page or one >> it already modified: > > Sure. I think there is an untapped potential here. > With this live packet prog_run anyone can buy 10G or 100G nic equipped > server and for free transform it into $300k+ IXIA beating machine. > It could be a game changer. pktgen doesn't come close. > I'm thinking about generating and consuming test TCP traffic. > TCP blaster would xmit 1M TCP connections through this live prog_run > into eth0 and consume the traffic returning from "server under test" > via a different XDP program attached to eth0. > The prog_run's xdp prog would need to send SYN, increment sequence number, > and keep sane data in the packets. It could be HTTP request, for example. I'm glad you see the potential :) > To achive this IXIA beating setup the TCP blaster would need a full > understanding of what page pool is doing with the packets. > Just saying "in arbitrary order" is a non starter. It diminishes > this live prog_run into pktgen equivalent which is still useful, > but lots of potential is lost. I don't think a detailed knowledge of how the pages are recycled is needed to implement a TCP stream? Even if you just rely on the packets being recycled with a fixed period of 128 pages, how does that make your XDP program simpler? You'll still have to update the packet header for each packet, with state kept in a map; so why is it helpful to know when a particular page comes back? I'll try implementing a TCP stream mode in xdp_trafficgen just to make sure I'm not missing something. But I believe that sending out a stream of packets that looks like a coherent TCP stream should be simple enough, at least. Dealing with the full handshake + CWND control loop will be harder, though, and right now I think it'll require multiple trips back to userspace. >> Another question seeing as the merge window is imminent: How do you feel >> about merging this before the merge window? I can resubmit before it >> opens with the updated selftest and documentation, and we can deal with >> any tweaks during the -rcs; or would you rather postpone the whole >> thing until the next cycle? > > It's already too late for this merge window, but bpf-next is always open. > Just like it was open for the last year. So please resubmit as soon as > the tests are green and this discussion is over. Ah, OK. I was under the impression that the cutoff date was tomorrow; has that changed? But no worries, I'll spend my Sunday outside instead of coding, then, and come back to this tomorrow :) -Toke