Re: [PATCH bpf-next V1] igc: enable and fix RX hash usage by netstack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 20/02/2023 16.39, Alexander Lobakin wrote:
From: Jesper Dangaard Brouer <jbrouer@xxxxxxxxxx>
Date: Thu, 16 Feb 2023 17:46:53 +0100


On 14/02/2023 14.21, Alexander Lobakin wrote:
From: Jesper Dangaard Brouer <brouer@xxxxxxxxxx>
Date: Fri, 10 Feb 2023 16:07:59 +0100

When function igc_rx_hash() was introduced in v4.20 via commit
0507ef8a0372
("igc: Add transmit and receive fastpath and interrupt handlers"), the
hardware wasn't configured to provide RSS hash, thus it made sense to
not
enable net_device NETIF_F_RXHASH feature bit.

[...]

@@ -311,6 +311,58 @@ extern char igc_driver_name[];
   #define IGC_MRQC_RSS_FIELD_IPV4_UDP    0x00400000
   #define IGC_MRQC_RSS_FIELD_IPV6_UDP    0x00800000
   +/* RX-desc Write-Back format RSS Type's */
+enum igc_rss_type_num {
+    IGC_RSS_TYPE_NO_HASH        = 0,
+    IGC_RSS_TYPE_HASH_TCP_IPV4    = 1,
+    IGC_RSS_TYPE_HASH_IPV4        = 2,
+    IGC_RSS_TYPE_HASH_TCP_IPV6    = 3,
+    IGC_RSS_TYPE_HASH_IPV6_EX    = 4,
+    IGC_RSS_TYPE_HASH_IPV6        = 5,
+    IGC_RSS_TYPE_HASH_TCP_IPV6_EX    = 6,
+    IGC_RSS_TYPE_HASH_UDP_IPV4    = 7,
+    IGC_RSS_TYPE_HASH_UDP_IPV6    = 8,
+    IGC_RSS_TYPE_HASH_UDP_IPV6_EX    = 9,
+    IGC_RSS_TYPE_MAX        = 10,
+};
+#define IGC_RSS_TYPE_MAX_TABLE        16
+#define IGC_RSS_TYPE_MASK        0xF

GENMASK()?


hmm... GENMASK(3,0) looks more confusing to me. The mask we need here is
so simple that I prefer not to complicate this with GENMASK.


Changed my mind, I'm going with GENMASK(3,0) in V3.

+
+/* igc_rss_type - Rx descriptor RSS type field */
+static inline u8 igc_rss_type(union igc_adv_rx_desc *rx_desc)

Why use types shorter than u32 on the stack?

Changing to u32 in V2

Why this union is not const here, since there are no modifications?

Sure

+{
+    /* RSS Type 4-bit number: 0-9 (above 9 is reserved) */
+    return rx_desc->wb.lower.lo_dword.hs_rss.pkt_info &
IGC_RSS_TYPE_MASK;

The most important I wanted to mention: doesn't this function make the
CPU read the uncached field again, while you could just read it once
onto the stack and then extract all such data from there?

I really don't think this is an issues here. The igc_adv_rx_desc is only
16 bytes and it should be hot in CPU cache by now.

Rx descriptors are located in the DMA coherent zone (allocated via
dma_alloc_coherent()), I am missing something? Because I was (I am) sure
CPU doesn't cache anything from it (and doesn't reorder reads/writes
from/to). I thought that's the point of coherent zones -- you may talk
to hardware without needing for syncing...


That is a good point and you are (likely) right.

I do want to remind you that this is a "fixes" patch that dates back to
v5.2.  This driver is from the very beginning coded to access descriptor
this way via union igc_adv_rx_desc.  For a fixes patch, I'm not going to
code up a new and more effecient way of accessing the descriptor memory.

If you truely believe this matters for a 2.5 Gbit/s device, then someone (e.g you) can go through this driver and change this pattern in the code.



To avoid the movzx I have changed this to do a u32 read instead.

+}
+
+/* Packet header type identified by hardware (when BIT(11) is zero).
+ * Even when UDP ports are not part of RSS hash HW still parse and
mark UDP bits
+ */
+enum igc_pkt_type_bits {
+    IGC_PKT_TYPE_HDR_IPV4    =    BIT(0),
+    IGC_PKT_TYPE_HDR_IPV4_WITH_OPT=    BIT(1), /* IPv4 Hdr includes
IP options */
+    IGC_PKT_TYPE_HDR_IPV6    =    BIT(2),
+    IGC_PKT_TYPE_HDR_IPV6_WITH_EXT=    BIT(3), /* IPv6 Hdr includes
extensions */
+    IGC_PKT_TYPE_HDR_L4_TCP    =    BIT(4),
+    IGC_PKT_TYPE_HDR_L4_UDP    =    BIT(5),
+    IGC_PKT_TYPE_HDR_L4_SCTP=    BIT(6),
+    IGC_PKT_TYPE_HDR_NFS    =    BIT(7),
+    /* Above only valid when BIT(11) is zero */
+    IGC_PKT_TYPE_L2        =    BIT(11),
+    IGC_PKT_TYPE_VLAN    =    BIT(12),
+    IGC_PKT_TYPE_MASK    =    0x1FFF, /* 13-bits */

Also GENMASK().

GENMASK would make more sense here.

+};
+
+/* igc_pkt_type - Rx descriptor Packet type field */
+static inline u16 igc_pkt_type(union igc_adv_rx_desc *rx_desc)

Also short types and consts.


Fixed in V2

+{
+    u32 data = le32_to_cpu(rx_desc->wb.lower.lo_dword.data);
+    /* Packet type is 13-bits - as bits (16:4) in lower.lo_dword*/
+    u16 pkt_type = (data >> 4) & IGC_PKT_TYPE_MASK;

Perfect candidate for FIELD_GET(). No, even for le32_get_bits().


Dropping all the code and defines for "igc_pkt_type", as the code ended
up not being used.  I simply kept this around to document what was in
the programmers datasheet (to help others understand the hardware).


I adjusted this, but I could not find a central define for FIELD_GET
(but many drivers open code this).

<linux/bitfield.h>. It has FIELD_{GET,PREP}() and also builds
{u,__le,__be}{8,16,32}_{encode,get,replace}_bits() via macro (the latter
doesn't get indexed by Elixir, as it doesn't parse functions built via
macros).

Thanks for the pointer to <linux/bitfield.h>, I'll be using that in V3.


Also my note above about excessive expensive reads.

+
+    return pkt_type;
+}
+
   /* Interrupt defines */
   #define IGC_START_ITR            648 /* ~6000 ints/sec */
   #define IGC_4K_ITR            980
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c
b/drivers/net/ethernet/intel/igc/igc_main.c
index 8b572cd2c350..42a072509d2a 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -1677,14 +1677,40 @@ static void igc_rx_checksum(struct igc_ring
*ring,
              le32_to_cpu(rx_desc->wb.upper.status_error));
   }
   +/* Mapping HW RSS Type to enum pkt_hash_types */
+struct igc_rss_type {
+    u8 hash_type; /* can contain enum pkt_hash_types */

Why make a struct for one field? + short type note

Also fixing this in V3.

+} igc_rss_type_table[IGC_RSS_TYPE_MAX_TABLE] = {
+    [IGC_RSS_TYPE_NO_HASH].hash_type      = PKT_HASH_TYPE_L2,
+    [IGC_RSS_TYPE_HASH_TCP_IPV4].hash_type      = PKT_HASH_TYPE_L4,
+    [IGC_RSS_TYPE_HASH_IPV4].hash_type      = PKT_HASH_TYPE_L3,
+    [IGC_RSS_TYPE_HASH_TCP_IPV6].hash_type      = PKT_HASH_TYPE_L4,
+    [IGC_RSS_TYPE_HASH_IPV6_EX].hash_type      = PKT_HASH_TYPE_L3,
+    [IGC_RSS_TYPE_HASH_IPV6].hash_type      = PKT_HASH_TYPE_L3,
+    [IGC_RSS_TYPE_HASH_TCP_IPV6_EX].hash_type = PKT_HASH_TYPE_L4,
+    [IGC_RSS_TYPE_HASH_UDP_IPV4].hash_type      = PKT_HASH_TYPE_L4,
+    [IGC_RSS_TYPE_HASH_UDP_IPV6].hash_type      = PKT_HASH_TYPE_L4,
+    [IGC_RSS_TYPE_HASH_UDP_IPV6_EX].hash_type = PKT_HASH_TYPE_L4,
+    [10].hash_type = PKT_HASH_TYPE_L2, /* RSS Type above 9
"Reserved" by HW */
+    [11].hash_type = PKT_HASH_TYPE_L2,
+    [12].hash_type = PKT_HASH_TYPE_L2,
+    [13].hash_type = PKT_HASH_TYPE_L2,
+    [14].hash_type = PKT_HASH_TYPE_L2,
+    [15].hash_type = PKT_HASH_TYPE_L2,

Changing these 10-15 to PKT_HASH_TYPE_NONE, which is zero.
The ASM generated table is smaller code size with zero padded content.


Why define those empty if you could do a bound check in the code
instead? E.g. `if (unlikely(bigger_than_9)) return PKT_HASH_TYPE_L2`.

Having a branch for this is likely slower.  On godbolt I see that this
generates suboptimal and larger code.

But you have to verify HW output anyway, right? Or would like to rely on
that on some weird revision it won't spit BIT(69) on you?


The table is constructed such that the lookup takes care of "verifying"
the HW output.  Notice that software will bit mask the last 4 bits, thus
the number will max be 15.  No matter what hardware outputs it is safe
to do a lookup in the table.  IMHO it is a simple way to avoid an
unnecessary verification branch and still be able to handle buggy/weird
HW revs.


+};
+
   static inline void igc_rx_hash(struct igc_ring *ring,
                      union igc_adv_rx_desc *rx_desc,
                      struct sk_buff *skb)
   {
-    if (ring->netdev->features & NETIF_F_RXHASH)
-        skb_set_hash(skb,
-                 le32_to_cpu(rx_desc->wb.lower.hi_dword.rss),
-                 PKT_HASH_TYPE_L3);
+    if (ring->netdev->features & NETIF_F_RXHASH) {

     if (!(feature & HASH))
         return;

and -1 indent level?

Usually, yes, I also prefer early return style code.
For one I just followed the existing style.

I'd not recommend "keep the existing style" of Intel drivers -- not
something I'd like to keep as is :D


Second, I tried to code it up, but it looks ugly in this case, as the
variable defines need to get moved outside the if statement.

+        u32 rss_hash = le32_to_cpu(rx_desc->wb.lower.hi_dword.rss);
+        u8  rss_type = igc_rss_type(rx_desc);
+        enum pkt_hash_types hash_type;
+
+        hash_type = igc_rss_type_table[rss_type].hash_type;
+        skb_set_hash(skb, rss_hash, hash_type);
+    }
   }

[...]

--Jesper




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux