Re: [PATCH bpf-next V2 3/6] selftests/bpf: xdp_hw_metadata RX hash return code info

Jesper Dangaard Brouer <jbrouer@xxxxxxxxxx> · Fri, 24 Mar 2023 13:14:24 +0100

On 23/03/2023 18.47, Stanislav Fomichev wrote:
On Thu, Mar 23, 2023 at 10:35 AM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:

On Thu, Mar 23, 2023 at 1:51 AM Jesper Dangaard Brouer
<jbrouer@xxxxxxxxxx> wrote:


On 22/03/2023 20.33, Stanislav Fomichev wrote:
On Wed, Mar 22, 2023 at 12:30 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:

On Wed, Mar 22, 2023 at 12:23 PM Stanislav Fomichev <sdf@xxxxxxxxxx> wrote:

On Wed, Mar 22, 2023 at 12:17 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:

On Wed, Mar 22, 2023 at 12:00 PM Stanislav Fomichev <sdf@xxxxxxxxxx> wrote:

On Wed, Mar 22, 2023 at 9:07 AM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:

On Wed, Mar 22, 2023 at 9:05 AM Jesper Dangaard Brouer
<jbrouer@xxxxxxxxxx> wrote:



On 21/03/2023 19.47, Stanislav Fomichev wrote:
On Tue, Mar 21, 2023 at 6:47 AM Jesper Dangaard Brouer
<brouer@xxxxxxxxxx> wrote:

When driver developers add XDP-hints kfuncs for RX hash it is
practical to print the return code in bpf_printk trace pipe log.

Print hash value as a hex value, both AF_XDP userspace and bpf_prog,
as this makes it easier to spot poor quality hashes.

Signed-off-by: Jesper Dangaard Brouer <brouer@xxxxxxxxxx>
---
    .../testing/selftests/bpf/progs/xdp_hw_metadata.c  |    9 ++++++---
    tools/testing/selftests/bpf/xdp_hw_metadata.c      |    5 ++++-
    2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
index 40c17adbf483..ce07010e4d48 100644
--- a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
+++ b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
@@ -77,10 +77,13 @@ int rx(struct xdp_md *ctx)
                   meta->rx_timestamp = 0; /* Used by AF_XDP as not avail signal */
           }

-       if (!bpf_xdp_metadata_rx_hash(ctx, &meta->rx_hash))
-               bpf_printk("populated rx_hash with %u", meta->rx_hash);
-       else
+       ret = bpf_xdp_metadata_rx_hash(ctx, &meta->rx_hash);
+       if (ret >= 0) {
+               bpf_printk("populated rx_hash with 0x%08X", meta->rx_hash);
+       } else {
+               bpf_printk("rx_hash not-avail errno:%d", ret);
                   meta->rx_hash = 0; /* Used by AF_XDP as not avail signal */
+       }

           return bpf_redirect_map(&xsk, ctx->rx_queue_index, XDP_PASS);
    }
diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c b/tools/testing/selftests/bpf/xdp_hw_metadata.c
index 400bfe19abfe..f3ec07ccdc95 100644
--- a/tools/testing/selftests/bpf/xdp_hw_metadata.c
+++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c
@@ -3,6 +3,9 @@
    /* Reference program for verifying XDP metadata on real HW. Functional test
     * only, doesn't test the performance.
     *
+ * BPF-prog bpf_printk info outout can be access via
+ * /sys/kernel/debug/tracing/trace_pipe

s/outout/output/


Fixed in V3

But let's maybe drop it? If you want to make it more usable, let's
have a separate patch to enable tracing and periodically dump it to
the console instead (as previously discussed).

Cat'ing /sys/kernel/debug/tracing/trace_pipe work for me regardless of
setting in
/sys/kernel/debug/tracing/events/bpf_trace/bpf_trace_printk/enable

We likely need a followup patch that adds a BPF config switch that can
disable bpf_printk calls, because this adds overhead and thus affects
the timestamps.

No. This is by design.
Do not use bpf_printk* in production.

I fully agree do not use bpf_printk in *production*.


But that's not for the production? xdp_hw_metadata is a small tool to
verify that the metadata being dumped is correct (during the
development).
We have a proper (less verbose) selftest in
{progs,prog_tests}/xdp_metadata.c (over veth).
This xdp_hw_metadata was supposed to be used for running it against
the real hardware, so having as much debugging at hand as possible
seems helpful? (at least it was helpful to me when playing with mlx4)

My experience when developing these kfuncs for igc (real hardware), this
"tool" xdp_hw_metadata was super helpful, because it was very verbose
(and I was juggling reading chip registers BE/LE and see patch1 a buggy
implementation for RX-hash).

As I wrote in cover-letter, I recommend other driver developers to do
the same, because it really help speed up the development. In theory
xdp_hw_metadata doesn't belong in selftests directory and IMHO it should
have been placed in samples/bpf/, but given the relationship with real
selftest {progs,prog_tests}/xdp_metadata.c I think it makes sense to
keep here.



The only use of bpf_printk is for debugging of bpf progs themselves.
It should not be used in any tool.

Hmm, good point. I guess it also means we won't have to mess with
enabling/dumping ftrace (and don't need this comment about cat'ing the
file).
Jesper, maybe we can instead pass the status of those
bpf_xdp_metadata_xxx kfuncs via 'struct xdp_meta'? And dump this info
from the userspace if needed.

There are so many other ways for bpf prog to communicate with user space.
Use ringbuf, perf_event buffer, global vars, maps, etc.
trace_pipe is debug only because it's global and will conflict with
all other debug sessions.

I want to highlight above paragraph: It is very true for production
code. (Anyone Googling this pay attention to above paragraph).


👍 makes sense, ty! hopefully we won't have to add a separate channel
for those and can (ab)use the metadata area.


Proposed solution: How about default disabling the bpf_printk's via a
macro define, and then driver developer can manually reenable them
easily via a single define, to enable a debugging mode.

I was only watching /sys/kernel/debug/tracing/trace_pipe when I was
debugging a driver issue.  Thus, an extra step of modifying a single
define in BPF seems easier, than instrumenting my driver with printk.

It's certainly fine to have commented out bpf_printk in selftests
and sample code.
But the patch does:
+       if (ret >= 0) {
+               bpf_printk("populated rx_hash with 0x%08X", meta->rx_hash);
+       } else {
+               bpf_printk("rx_hash not-avail errno:%d", ret);
                 meta->rx_hash = 0; /* Used by AF_XDP as not avail signal */
+       }

It feels that printk is the only thing that provides the signal to the user.
Doing s/ret >= 0/true/ won't make any difference to selftest and
to the consumer and that's my main concern with such selftest/samples.

I agree, having this signal delivered to the user (without the ifdefs)
seems like a better way to go.

I agree that we have a signal that we are not delivering to the user.

Just so we are on the same page, let me explain how above code is
problematic. As the rx_hash value zero have two meanings:

 (1) Negative 'ret' set rx_hash=0 as err signal to AF_XDP "user"
 (2) Hardware set rx_hash=0, when hash-type == IGC_RSS_TYPE_NO_HASH

Case (2) happens for all L2 packets e.g. ARP packets.  See in patch-1
where I map IGC_RSS_TYPE_NO_HASH to netstack type PKT_HASH_TYPE_L2.
I did consider return errno/negative number for IGC_RSS_TYPE_NO_HASH,
but I wanted to keep kfunc code as simple as possible (both for speed
and if we need to unroll as byte-code later). As i225 hardware still
writes zero into hash word I choose to keep code simple.


IMHO this symptom is related to an API problem of the kfunc, that
doesn't provide the hash-type.

Seems trivial to do something like the following below? (untested,
doesn't compile, gmail-copy-pasted so don't try to apply it)


If the kfunc provided the hash-type, which will be a positive number.
Then I would add a signed integer to meta, which can store the hash-type
or the negative errno number.


diff --git a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
index 4c55b4d79d3d..061c410f68ea 100644
--- a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
+++ b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
@@ -12,6 +12,9 @@ struct {
   __type(value, __u32);
  } xsk SEC(".maps");

+int received;
+int dropped;
+
  extern int bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx,
   __u64 *timestamp) __ksym;
  extern int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx,
@@ -52,11 +55,11 @@ int rx(struct xdp_md *ctx)
   if (udp->dest != bpf_htons(9091))
   return XDP_PASS;

It we wanted to make this program user "friendly", we should also count
packets that doesn't get redirected to AF_XDP "user" and instead skipped.

- bpf_printk("forwarding UDP:9091 to AF_XDP");
+ __sync_fetch_and_add(&received, 1);

   ret = bpf_xdp_adjust_meta(ctx, -(int)sizeof(struct xdp_meta));
   if (ret != 0) {
- bpf_printk("bpf_xdp_adjust_meta returned %d", ret);
+ __sync_fetch_and_add(&dropped, 1);
   return XDP_PASS;

Is it a "dropped" or a "skipped" packet (return XDP_PASS)?

   }
[...]

--Jesper