> From: Jesper Dangaard Brouer <jbrouer@xxxxxxxxxx> > Sent: Tuesday, May 26, 2020 10:04 AM > To: Denis Salopek > Cc: xdp-newbies@xxxxxxxxxxxxxxx; Alexander Duyck > Subject: Re: XDP_REDIRECT forwarding speed > > On Tue, 26 May 2020 07:00:30 +0000 > Denis Salopek <Denis.Salopek@xxxxxx> wrote: > > > I want to make sure I did everything right to make my XDP program > > (simple forwarding with bpf_redirect_map) as fast as possible. Is following > > advices and gotchas from this: > > https://www.mail-archive.com/netdev@xxxxxxxxxxxxxxx/msg184139.html > > I prefer links to lore.kernel.org: > [1] https://lore.kernel.org/netdev/20170821212506.1cb0d5d6@xxxxxxxxxx/ > > Do notice that my results in [1] is for a single queue and single CPU. > In production I assume that you can likely scale this across more CPUs ;-) > > > enough or are there some additional/newer recommendations? I managed > > to get near line-rate on my Intel X520s (on Ryzen 3700X and one > > queue/CPU), but not quite 14.88 Mpps so I was wondering is there > > something else to speed things up even more. > > In [1] I mention the need to tune the TX-queue to keep up via either > adjusting the TX-DMA completion interrupt interval: > > Tuned with rx-usecs 25: > ethtool -C ixgbe1 rx-usecs 25 ;\ > ethtool -C ixgbe2 rx-usecs 25 > > Or increasing the size of the TX-queue, so it doesn't overrun: > > Tuned with adjusting ring-queue sizes: > ethtool -G ixgbe1 rx 1024 tx 1024 ;\ > ethtool -G ixgbe2 rx 1024 tx 1024 > > This might not be needed any longer, as I think it was Alexander, that > implemented an improved interrupt adjustment scheme for ixgbe. Thank you for the info. > > Also, are there any recommended settings/tweaks for bidirectional > > forwarding? I suppose there would be a drop in performance compared to > > single direction, but has anyone done any benchmarks? > > As this was 1-CPU you can just run the other direction on another CPU. > That said, it can still be an advantage to run the bidirectional > traffic on the same CPU and RX-TX-queue pair, as above issue with > TX-queue DMA cleanups/completions goes away. Because, the ixgbe driver > will do TX-cleanups as part (before) the RX-processing. Yeah, you are right, bidirectional yields 10-15% more pps. > What is your use-case? > e.g. building an IPv4 router? Just exploring the performance potential of userspace vs XDP for DDoS-scrubbing type middlebox (i.e. forwarding) tasks. > > -- > Best regards, > Jesper Dangaard Brouer > MSc.CS, Principal Kernel Engineer at Red Hat > LinkedIn: http://www.linkedin.com/in/brouer > > Denis