Re: [PATCH bpf-next v3 0/6] Introduce the BPF dispatcher

Jesper Dangaard Brouer <brouer@xxxxxxxxxx> · Mon, 9 Dec 2019 20:50:10 +0100

On Mon, 9 Dec 2019 18:45:12 +0100
Björn Töpel <bjorn.topel@xxxxxxxxx> wrote:

> On Mon, 9 Dec 2019 at 18:00, Jesper Dangaard Brouer <brouer@xxxxxxxxxx> wrote:
> >
> > On Mon,  9 Dec 2019 14:55:16 +0100
> > Björn Töpel <bjorn.topel@xxxxxxxxx> wrote:
> >  
> > > Performance
> > > ===========
> > >
> > > The tests were performed using the xdp_rxq_info sample program with
> > > the following command-line:
> > >
> > > 1. XDP_DRV:
> > >   # xdp_rxq_info --dev eth0 --action XDP_DROP
> > > 2. XDP_SKB:
> > >   # xdp_rxq_info --dev eth0 -S --action XDP_DROP
> > > 3. xdp-perf, from selftests/bpf:
> > >   # test_progs -v -t xdp_perf
> > >
> > >
> > > Run with mitigations=auto
> > > -------------------------
> > >
> > > Baseline:
> > > 1. 22.0 Mpps
> > > 2. 3.8 Mpps
> > > 3. 15 ns
> > >
> > > Dispatcher:
> > > 1. 29.4 Mpps (+34%)
> > > 2. 4.0 Mpps  (+5%)
> > > 3. 5 ns      (+66%)  
> >
> > Thanks for providing these extra measurement points.  This is good
> > work.  I just want to remind people that when working at these high
> > speeds, it is easy to get amazed by a +34% improvement, but we have to
> > be careful to understand that this is saving approx 10 ns time or
> > cycles.
> >
> > In reality cycles or time saved in #2 (3.8 Mpps -> 4.0 Mpps) is larger
> > (1/3.8-1/4)*1000 = 13.15 ns.  Than #1 (22.0 Mpps -> 29.4 Mpps)
> > (1/22-1/29.4)*1000 = 11.44 ns. Test #3 keeps us honest 15 ns -> 5 ns =
> > 10 ns.  The 10 ns improvement is a big deal in XDP context, and also
> > correspond to my own experience with retpoline (approx 12 ns overhead).
> >  
> 
> Ok, good! :-)
> 
> > To Bjørn, I would appreciate more digits on your Mpps numbers, so I get
> > more accuracy on my checks-and-balances I described above.  I suspect
> > the 3.8 Mpps -> 4.0 Mpps will be closer to the other numbers when we
> > get more accuracy.
> >  
> 
> Ok! Let me re-run them. 

Well, I don't think you should waste your time re-running these...

It clearly shows a significant improvement.  I'm just complaining that
I didn't have enough digits to do accurate checks-and-balances, they
are close enough that I believe them.

> If you have some spare cycles, yt would be
> great if you could try it out as well on your Mellanox setup.

I'll add it to my TODO list... but no promises.

> Historically you've always been able to get more stable numbers than
> I. :-)
> 
> >  
> > > Dispatcher (full; walk all entries, and fallback):
> > > 1. 20.4 Mpps (-7%)
> > > 2. 3.8 Mpps
> > > 3. 18 ns     (-20%)
> > >
> > > Run with mitigations=off
> > > ------------------------
> > >
> > > Baseline:
> > > 1. 29.6 Mpps
> > > 2. 4.1 Mpps
> > > 3. 5 ns
> > >
> > > Dispatcher:
> > > 1. 30.7 Mpps (+4%)
> > > 2. 4.1 Mpps
> > > 3. 5 ns  
> >
> > While +4% sounds good, but could be measurement noise ;-)
> >
> >  (1/29.6-1/30.7)*1000 = 1.21 ns
> >
> > As both #3 says 5 ns.
> >  
> 
> True. Maybe that simply hints that we shouldn't use the dispatcher here?

No. I actually think it is worth exposing this code as much as
possible. And if it really is 1.2 ns improvement, then I'll gladly take
that as well ;-)

I think this is awesome work! -- thanks for doing this!!!
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer