On Thu, Oct 17, 2024 at 5:17 AM Stanislav Fomichev <stfomichev@xxxxxxxxx> wrote: Hi Stanislav, Thank you so much for testing and improvement! > > On 10/03, Taehee Yoo wrote: > > This series implements device memory TCP for bnxt_en driver and > > necessary ethtool command implementations. > > > > NICs that use the bnxt_en driver support tcp-data-split feature named > > HDS(header-data-split). > > But there is no implementation for the HDS to enable/disable by ethtool. > > Only getting the current HDS status is implemented and the HDS is just > > automatically enabled only when either LRO, HW-GRO, or JUMBO is enabled. > > The hds_threshold follows the rx-copybreak value but it wasn't > > changeable. > > > > Currently, bnxt_en driver enables tcp-data-split by default but not > > always work. > > There is hds_threshold value, which indicates that a packet size is > > larger than this value, a packet will be split into header and data. > > hds_threshold value has been 256, which is a default value of > > rx-copybreak value too. > > The rx-copybreak value hasn't been allowed to change so the > > hds_threshold too. > > > > This patchset decouples hds_threshold and rx-copybreak first. > > and make tcp-data-split, rx-copybreak, and > > tcp-data-split-thresh(hds_threshold) configurable independently. > > > > But the default configuration is the same. > > The default value of rx-copybreak is 256 and default > > tcp-data-split-thresh is also 256. > > > > There are several related options. > > TPA(HW-GRO, LRO), JUMBO, jumbo_thresh(firmware command), and Aggregation > > Ring. > > > > The aggregation ring is fundamental to these all features. > > When gro/lro/jumbo packets are received, NIC receives the first packet > > from the normal ring. > > follow packets come from the aggregation ring. > > > > These features are working regardless of HDS. > > When TPA is enabled and HDS is disabled, the first packet contains > > header and payload too. > > and the following packets contain payload only. > > If HDS is enabled, the first packet contains the header only, and the > > following packets contain only payload. > > So, HW-GRO/LRO is working regardless of HDS. > > > > There is another threshold value, which is jumbo_thresh. > > This is very similar to hds_thresh, but jumbo thresh doesn't split > > header and data. > > It just split the first and following data based on length. > > When NIC receives 1500 sized packet, and jumbo_thresh is 256(default, but > > follows rx-copybreak), > > the first data is 256 and the following packet size is 1500-256. > > > > Before this patch, at least if one of GRO, LRO, and JUMBO flags is > > enabled, the Aggregation ring will be enabled. > > If the Aggregation ring is enabled, both hds_threshold and > > jumbo_thresh are set to the default value of rx-copybreak. > > > > So, GRO, LRO, JUMBO frames, they larger than 256 bytes, they will > > be split into header and data if the protocol is TCP or UDP. > > for the other protocol, jumbo_thresh works instead of hds_thresh. > > > > This means that tcp-data-split relies on the GRO, LRO, and JUMBO flags. > > But by this patch, tcp-data-split no longer relies on these flags. > > If the tcp-data-split is enabled, the Aggregation ring will be > > enabled. > > Also, hds_threshold no longer follows rx-copybreak value, it will > > be set to the tcp-data-split-thresh value by user-space, but the > > default value is still 256. > > > > If the protocol is TCP or UDP and the HDS is disabled and Aggregation > > ring is enabled, a packet will be split into several pieces due to > > jumbo_thresh. > > > > When XDP is attached, tcp-data-split is automatically disabled. > > > > LRO, GRO, and JUMBO are tested with BCM57414, BCM57504 and the firmware > > version is 230.0.157.0. > > I couldn't find any specification about minimum and maximum value > > of hds_threshold, but from my test result, it was about 0 ~ 1023. > > It means, over 1023 sized packets will be split into header and data if > > tcp-data-split is enabled regardless of hds_treshold value. > > When hds_threshold is 1500 and received packet size is 1400, HDS should > > not be activated, but it is activated. > > The maximum value of hds_threshold(tcp-data-split-thresh) > > value is 256 because it has been working. > > It was decided very conservatively. > > > > I checked out the tcp-data-split(HDS) works independently of GRO, LRO, > > JUMBO. Tested GRO/LRO, JUMBO with enabled HDS and disabled HDS. > > Also, I checked out tcp-data-split should be disabled automatically > > when XDP is attached and disallowed to enable it again while XDP is > > attached. I tested ranged values from min to max for > > tcp-data-split-thresh and rx-copybreak, and it works. > > tcp-data-split-thresh from 0 to 256, and rx-copybreak 65 to 256. > > When testing this patchset, I checked skb->data, skb->data_len, and > > nr_frags values. > > > > The first patch implements .{set, get}_tunable() in the bnxt_en. > > The bnxt_en driver has been supporting the rx-copybreak feature but is > > not configurable, Only the default rx-copybreak value has been working. > > So, it changes the bnxt_en driver to be able to configure > > the rx-copybreak value. > > > > The second patch adds an implementation of tcp-data-split ethtool > > command. > > The HDS relies on the Aggregation ring, which is automatically enabled > > when either LRO, GRO, or large mtu is configured. > > So, if the Aggregation ring is enabled, HDS is automatically enabled by > > it. > > > > The third patch adds tcp-data-split-thresh command in the ethtool. > > This threshold value indicates if a received packet size is larger > > than this threshold, the packet's header and payload will be split. > > Example: > > # ethtool -G <interface name> tcp-data-split-thresh <value> > > This option can not be used when tcp-data-split is disabled or not > > supported. > > # ethtool -G enp14s0f0np0 tcp-data-split on tcp-data-split-thresh 256 > > # ethtool -g enp14s0f0np0 > > Ring parameters for enp14s0f0np0: > > Pre-set maximums: > > ... > > Current hardware settings: > > ... > > TCP data split: on > > TCP data split thresh: 256 > > > > # ethtool -G enp14s0f0np0 tcp-data-split off > > # ethtool -g enp14s0f0np0 > > Ring parameters for enp14s0f0np0: > > Pre-set maximums: > > ... > > Current hardware settings: > > ... > > TCP data split: off > > TCP data split thresh: n/a > > > > The fourth patch adds the implementation of tcp-data-split-thresh logic > > in the bnxt_en driver. > > The default value is 256, which used to be the default rx-copybreak > > value. > > > > The fifth and sixth adds condition check for devmem and ethtool. > > If tcp-data-split is disabled or threshold value is not zero, setup of > > devmem will be failed. > > Also, tcp-data-split and tcp-data-split-thresh will not be changed > > while devmem is running. > > > > The last patch implements device memory TCP for bnxt_en driver. > > It usually converts generic page_pool api to netmem page_pool api. > > > > No dependencies exist between device memory TCP and GRO/LRO/MTU. > > Only tcp-data-split and tcp-data-split-thresh should be enabled when the > > device memory TCP. > > While devmem TCP is set, tcp-data-split and tcp-data-split-thresh can't > > be updated because core API disallows change. > > > > I tested the interface up/down while devmem TCP running. It works well. > > Also, channel count change, and rx/tx ringsize change tests work well too. > > > > The devmem TCP test NIC is BCM57504 > > [..] > > > All necessary configuration validations exist at the core API level. > > > > Note that by this patch, the setup of device memory TCP would fail. > > Because tcp-data-split-thresh command is not supported by ethtool yet. > > The tcp-data-split-thresh should be 0 for setup device memory TCP and > > the default of bnxt is 256. > > So, for the bnxt, it always fails until ethtool supports > > tcp-data-split-thresh command. > > > > The ncdevmem.c will be updated after ethtool supports > > tcp-data-split-thresh option. > > FYI, I've tested your series with BCM57504 on top of [1] and [2] with > a couple of patches to make ncdevmem.c and TX work (see below). [1] > decouples ncdevmem from ethtool so we can flip header split settings > without requiring recent ethtool. Both RX and TX work perfectly. > Feel free to carry: > > Tested-by: Stanislav Fomichev <sdf@xxxxxxxxxxx> Thank you so much for your work! I will try to test your TX side patch before sending v4 patch. > > Also feel free to take over the ncdevmem patch if my ncdevmem changes > get pulled before your series. Good, Thanks! > > 1: https://lore.kernel.org/netdev/20241009171252.2328284-1-sdf@xxxxxxxxxxx/ > 2: https://lore.kernel.org/netdev/20240913150913.1280238-1-sdf@xxxxxxxxxxx/ > > commit 69bc0e247eb4132ef5fd0b118719427d35d462fc > Author: Stanislav Fomichev <sdf@xxxxxxxxxxx> > AuthorDate: Tue Oct 15 15:56:43 2024 -0700 > Commit: Stanislav Fomichev <sdf@xxxxxxxxxxx> > CommitDate: Wed Oct 16 13:13:42 2024 -0700 > > selftests: ncdevmem: Set header split threshold to 0 > > Needs to happen on BRCM to allow devmem to be attached. > > Signed-off-by: Stanislav Fomichev <sdf@xxxxxxxxxxx> > > diff --git a/tools/testing/selftests/drivers/net/hw/ncdevmem.c b/tools/testing/selftests/drivers/net/hw/ncdevmem.c > index 903dac3e61d5..6a94d52a6c43 100644 > --- a/tools/testing/selftests/drivers/net/hw/ncdevmem.c > +++ b/tools/testing/selftests/drivers/net/hw/ncdevmem.c > @@ -322,6 +322,8 @@ static int configure_headersplit(bool on) > ethtool_rings_set_req_set_header_dev_index(req, ifindex); > /* 0 - off, 1 - auto, 2 - on */ > ethtool_rings_set_req_set_tcp_data_split(req, on ? 2 : 0); > + if (enable) > + ethtool_rings_set_req_set_tcp_data_split_thresh(req, 0); > ret = ethtool_rings_set(ys, req); > if (ret < 0) > fprintf(stderr, "YNL failed: %s\n", ys->err.msg); > > > commit ef5ba647bc94a19153c2c5cfc64ebe4cb86ac58d > Author: Stanislav Fomichev <sdf@xxxxxxxxxxx> > AuthorDate: Fri Oct 11 13:52:03 2024 -0700 > Commit: Stanislav Fomichev <sdf@xxxxxxxxxxx> > CommitDate: Wed Oct 16 13:13:42 2024 -0700 > > bnxt_en: support tx device memory > > The only change is to not unmap the frags on completions. > > Signed-off-by: Stanislav Fomichev <sdf@xxxxxxxxxxx> > > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c > index 6e422e24750a..cb22707a35aa 100644 > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c > @@ -692,7 +692,10 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev) > goto tx_dma_error; > > tx_buf = &txr->tx_buf_ring[RING_TX(bp, prod)]; > - dma_unmap_addr_set(tx_buf, mapping, mapping); > + if (netmem_is_net_iov(frag->netmem)) > + dma_unmap_addr_set(tx_buf, mapping, 0); > + else > + dma_unmap_addr_set(tx_buf, mapping, mapping); > > txbd->tx_bd_haddr = cpu_to_le64(mapping); > > @@ -749,9 +752,10 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev) > for (i = 0; i < last_frag; i++) { > prod = NEXT_TX(prod); > tx_buf = &txr->tx_buf_ring[RING_TX(bp, prod)]; > - dma_unmap_page(&pdev->dev, dma_unmap_addr(tx_buf, mapping), > - skb_frag_size(&skb_shinfo(skb)->frags[i]), > - DMA_TO_DEVICE); > + if (dma_unmap_addr(tx_buf, mapping)) > + dma_unmap_page(&pdev->dev, dma_unmap_addr(tx_buf, mapping), > + skb_frag_size(&skb_shinfo(skb)->frags[i]), > + DMA_TO_DEVICE); > } > > tx_free: > @@ -821,11 +825,12 @@ static bool __bnxt_tx_int(struct bnxt *bp, struct bnxt_tx_ring_info *txr, > for (j = 0; j < last; j++) { > cons = NEXT_TX(cons); > tx_buf = &txr->tx_buf_ring[RING_TX(bp, cons)]; > - dma_unmap_page( > - &pdev->dev, > - dma_unmap_addr(tx_buf, mapping), > - skb_frag_size(&skb_shinfo(skb)->frags[j]), > - DMA_TO_DEVICE); > + if (dma_unmap_addr(tx_buf, mapping)) > + dma_unmap_page( > + &pdev->dev, > + dma_unmap_addr(tx_buf, mapping), > + skb_frag_size(&skb_shinfo(skb)->frags[j]), > + DMA_TO_DEVICE); > } > if (unlikely(is_ts_pkt)) { > if (BNXT_CHIP_P5(bp)) { > @@ -3296,10 +3301,11 @@ static void bnxt_free_tx_skbs(struct bnxt *bp) > skb_frag_t *frag = &skb_shinfo(skb)->frags[k]; > > tx_buf = &txr->tx_buf_ring[ring_idx]; > - dma_unmap_page( > - &pdev->dev, > - dma_unmap_addr(tx_buf, mapping), > - skb_frag_size(frag), DMA_TO_DEVICE); > + if (dma_unmap_addr(tx_buf, mapping)) > + dma_unmap_page( > + &pdev->dev, > + dma_unmap_addr(tx_buf, mapping), > + skb_frag_size(frag), DMA_TO_DEVICE); > } > dev_kfree_skb(skb); > } Thanks a lot! Taehee Yoo