> Lorenzo Bianconi wrote: > > > Lorenzo Bianconi wrote: > > > > > Lorenzo Bianconi wrote: > > > > [...] > > > > > > > > + * Description > > > > > > + * Adjust frame headers moving *offset* bytes from/to the second > > > > > > + * buffer to/from the first one. This helper can be used to move > > > > > > + * headers when the hw DMA SG does not copy all the headers in > > > > > > + * the first fragment. > > > > > > > > + Eric to the discussion > > > > > > [...] > > > > > > > +BPF_CALL_2(bpf_xdp_adjust_mb_header, struct xdp_buff *, xdp, > > > > > > + int, offset) > > > > > > +{ > > > > > > + void *data_hard_end, *data_end; > > > > > > + struct skb_shared_info *sinfo; > > > > > > + int frag_offset, frag_len; > > > > > > + u8 *addr; > > > > > > + > > > > > > + if (!xdp->mb) > > > > > > + return -EOPNOTSUPP; > > > > > > Not required for this patch necessarily but I think it would be better user > > > experience if instead of EOPNOTSUPP here we did the header split. This > > > would allocate a frag and copy the bytes around as needed. Yes it might > > > be slow if you don't have a frag free in the driver, but if user wants to > > > do header split and their hardware can't do it we would have a way out. > > > > > > I guess it could be an improvement for later though. > > > > I have no a strong opinion on this, I did it in this way to respect the rule "we > > do not allocate memory for XDP". > > > > @Jesper, David: thoughts? > > Consider adding a flags field to the helper so we could do this later with > a flag. Then users who want the alloc can set the flag and get it. > > [...] > > > > > > > > > How/when does the header split go wrong on the mvneta device? I guess > > > this is to fix a real bug/issue not some theoritical one? An example > > > in the commit message would make this concrete. Soemthing like, > > > "When using RX zerocopy to mmap data into userspace application if > > > a packet with [all these wild headers] is received rx zerocopy breaks > > > because header split puts headers X in the data frag confusing apps". > > > > This issue does not occur with mvneta since the driver is not capable of > > performing header split AFAIK. The helper has been introduced to cover the > > "issue" reported by Eric in his NetDevConf presentation. In order to test the > > helper I modified the mventa rx napi loop in a controlled way (this patch can't > > be sent upstream, it is for testing only :)) > > I will improve commit message in v3. > > Ah ok so really there is no users for the helper then IMO just drop > the patch until we have a user then. I agree, this helper is not strictly related to the series. I added it in v2 to provide another example of using xdp_buff mb bit. > > > > > > > > > > > > > > > > > > > > Also and even more concerning I think this API requires the > > > > > driver to populate shinfo. If we use TX_REDIRECT a lot or TX_XMIT > > > > > this means we need to populate shinfo when its probably not ever > > > > > used. If our driver is smart L2/L3 headers are in the readable > > > > > data and prefetched. Writing frags into the end of a page is likely > > > > > not free. > > > > > > > > Sorry I did not get what you mean with "populate shinfo" here. We need to > > > > properly set shared_info in order to create the xdp multi-buff. > > > > Apart of header splits, please consider the main uses-cases for > > > > xdp multi-buff are XDP with TSO and Jumbo frames. > > > > > > The use case I have in mind is a XDP_TX or XDP_REDIRECT load balancer. > > > I wont know this at the driver level and now I'll have to write into > > > the back of every page with this shinfo just in case. If header > > > split is working I should never need to even touch the page outside > > > the first N bytes that were DMAd and in cache with DDIO. So its extra > > > overhead for something that is unlikely to happen in the LB case. > > > > So far the skb_shared_info in constructed in mvneta only if the hw splits > > the received data in multiple buffers (so if the MTU is greater than 1 PAGE, > > please see comments below). Moreover the shared_info is present only in the > > first buffer. > > Still in a normal L2/L3/L4 use case I expect all the headers you > need to be in the fist buffer so its unlikely for use cases that > send most traffic via XDP_TX for example to ever need the extra > info. In these cases I think you are paying some penalty for > having to do the work of populating the shinfo. Maybe its measurable > maybe not I'm not sure. > > Also if we make it required for multi-buffer than we also need > the shinfo on 40gbps or 100gbps nics and now even small costs > matter. Now I realized I used the word "split" in a not clear way here, I apologize for that. What I mean is not related "header" split, I am referring to the case where the hw is configured with a given rx buffer size (e.g. 1 PAGE) and we have set a higher MTU/max received size (e.g. 9K). In this case the hw will "split" the jumbo received frame over multiple rx buffers/descriptors. Populating the "xdp_shared_info" we will forward this layout info to the eBPF sandbox and to a remote driver/cpu. Please note this use case is not currently covered by XDP so if we develop it a proper way I guess we should not get any performance hit for the legacy single-buffer mode since we will not populate the shared_info for it (I think you refer to the "legacy" use-case in your "normal L2/L3/L4" example, right?) Anyway I will run some tests to verify the performances for the single buffer use-case are not hit. Regards, Lorenzo > > > > > > > > > If you take the simplest possible program that just returns XDP_TX > > > and run a pkt generator against it. I believe (haven't run any > > > tests) that you will see overhead now just from populating this > > > shinfo. I think it needs to only be done when its needed e.g. when > > > user makes this helper call or we need to build the skb and populate > > > the frags there. > > > > sure, I will carry out some tests. > > Thanks! > > > > > > > > > I think a smart driver will just keep the frags list in whatever > > > form it has them (rx descriptors?) and push them over to the > > > tx descriptors without having to do extra work with frag lists. > > > > I think there are many use-cases where we want to have this info available in > > xdp_buff/xdp_frame. E.g: let's consider the following Jumbo frame example: > > - MTU > 1 PAGE (so we the driver will split the received data in multiple rx > > descriptors) > > - the driver performs a XDP_REDIRECT to a veth or cpumap > > > > Relying on the proposed architecture we could enable GRO in veth or cpumap I > > guess since we can build a non-linear skb from the xdp multi-buff, right? > > I'm not disputing there are use-cases. But, I'm trying to see if we > can cover those without introducing additional latency in other > cases. Hence the extra benchmarks request ;) > > > > > > > > > > > > > > > > > > > > Did you benchmark this? > > > > > > > > will do, I need to understand if we can use tiny buffers in mvneta. > > > > > > Why tiny buffers? How does mvneta layout the frags when doing > > > header split? Can we just benchmark what mvneta is doing at the > > > end of this patch series? > > > > for the moment mvneta can split the received data when the previous buffer is > > full (e.g. when we the first page is completely written). I want to explore if > > I can set a tiny buffer (e.g. 128B) as max received buffer to run some performance > > tests and have some "comparable" results respect to the ones I got when I added XDP > > support to mvneta. > > OK would be great. > > > > > > > > > Also can you try the basic XDP_TX case mentioned above. > > > I don't want this to degrade existing use cases if at all > > > possible. > > > > sure, will do. > > Thanks! >
Attachment:
signature.asc
Description: PGP signature