Re: Profiling XDP programs for performance issues

Magnus Karlsson <magnus.karlsson@xxxxxxxxx> · Fri, 9 Apr 2021 08:40:51 +0200

On Fri, Apr 9, 2021 at 1:06 AM Neal Shukla <nshukla@xxxxxxxxxxxxx> wrote:
>
> Using perf, we've confirmed that the line mentioned has a 25.58% cache miss
> rate.

Do these hit in the LLC or in DRAM? In any case, your best bet is
likely to prefetch this into your L1/L2. In my experience, the best
way to do this is not to use an explicit prefetch instruction but to
touch/fetch the cache lines you need in the beginning of your
computation and let the fetch latency and the usage of the first cache
line hide the latencies of fetching the others. In your case, touch
both metadata and packet at the same time. Work with the metadata and
other things then come back to the packet data and hopefully the
relevant part will reside in the cache or registers by now. If that
does not work, touch packet number N+1 just before starting with
packet N.

Very general recommendations but hope it helps anyway. How exactly to
do this efficiently is very application dependent.

/Magnus

> On Thu, Apr 8, 2021 at 2:38 PM Zvi Effron <zeffron@xxxxxxxxxxxxx> wrote:
> >
> > Apologies for the spam to anyone who received my first response, but
> > it was accidentally sent as HTML and rejected by the mailing list.
> >
> > On Thu, Apr 8, 2021 at 11:20 AM Neal Shukla <nshukla@xxxxxxxxxxxxx> wrote:
> > >
> > > System Info:
> > > CPU: Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz
> > > Network Adapter/NIC: Intel X710
> > > Driver: i40e
> > > Kernel version: 5.8.15
> > > OS: Fedora 33
> > >
> >
> > Slight correction, we're actually on the 5.10.10 kernel.
> >
> > --Zvi