Re: Profiling XDP programs for performance issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



System Info:
CPU: Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz
Network Adapter/NIC: Intel X710
Driver: i40e
Kernel version: 5.8.15
OS: Fedora 33


It’s worth noting that we tried expanding the DDIO to full (0x7ff) and
a little more than half (0x7f0) with no material effect.

- Neal

On Thu, Apr 8, 2021 at 3:32 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
>
> Neal Shukla <nshukla@xxxxxxxxxxxxx> writes:
>
> > We’ve been introducing bpf_tail_call’s into our XDP programs and have run into
> > packet loss and latency increases when performing load tests. After profiling
> > our code we’ve come to the conclusion that this is the problem area in our code:
> > `int layer3_protocol = bpf_ntohs(ethernet_header->h_proto);`
> >
> > This is the first time we read from the packet in the first XDP program. We have
> > yet to make a tail call at this point. However, we do write into the metadata
> > section prior to this line.
> >
> > How We Profiled Our Code:
> > To profile our code, we used https://github.com/iovisor/bpftrace. We ran this
> > command while sending traffic to our machine:
> > `sudo bpftrace bpftrace -e 'profile:hz:99 { @[kstack] = count(); }' >
> > /tmp/stack_samples.out`
> >
> > From there we got a kernel stack trace with the most frequently counted spots at
> > the bottom of the output file. The most commonly hit spot asides from the CPU
> > idle look like:
> > ```
> > @[
> >     bpf_prog_986b0b3beb6f0873_some_program+290
> >     i40e_napi_poll+1897
> >     net_rx_action+309
> >     __softirqentry_text_start+202
> >     run_ksoftirqd+38
> >     smpboot_thread_fn+197
> >     kthread+283
> >     ret_from_fork+34
> > ]: 8748
> > ```
> >
> > We then took the program id and ran this command to retrieve the jited code:
> > `sudo bpftool prog dump jited tag 986b0b3beb6f0873`
> >
> > By converting the decimal offset (290) from the stack trace to hex format (122)
> > we found the line which it’s referring to in the jited code:
> > ```
> > 11d:   movzbq 0xc(%r15),%rsi
> > 122:   movzbq 0xd(%r15),%rdi
> > 127:   shl         $0x8,%rdi
> > 12b:   or          %rsi,%rdi
> > 12e:   ror         $0x8,%di
> > 132:   movzwl %di,%edi
> > ```
> > We've mapped this portion to refer to the line mentioned earlier:
> > `int layer3_protocol = bpf_ntohs(ethernet_header->h_proto);`
> >
> > 1) Are we correctly profiling our XDP programs?
> >
> > 2) Is there a reason why our first read into the packet would cause this issue?
> > And what would be the best way to solve the issue?
> > We've theorized it may have to do with cache or TLB misses as we've added a lot
> > more instructions to our programs.
>
> Yeah, this sounds like a caching issue. What system are you running this
> on? Intel's DDIO feature that DMAs packets directly to L3 cache tends to
> help with these sorts of things, but maybe your system doesn't have
> that, or it's not being used for some reason?
>
> Adding a few other people who have a better grasp of these details than
> me, in the hope that they can be more helpful :)
>
> -Toke
>




[Index of Archives]     [Linux Networking Development]     [Fedora Linux Users]     [Linux SCTP]     [DCCP]     [Gimp]     [Yosemite Campsites]

  Powered by Linux