On 10/03, Taehee Yoo wrote: > This series implements device memory TCP for bnxt_en driver and > necessary ethtool command implementations. > > NICs that use the bnxt_en driver support tcp-data-split feature named > HDS(header-data-split). > But there is no implementation for the HDS to enable/disable by ethtool. > Only getting the current HDS status is implemented and the HDS is just > automatically enabled only when either LRO, HW-GRO, or JUMBO is enabled. > The hds_threshold follows the rx-copybreak value but it wasn't > changeable. > > Currently, bnxt_en driver enables tcp-data-split by default but not > always work. > There is hds_threshold value, which indicates that a packet size is > larger than this value, a packet will be split into header and data. > hds_threshold value has been 256, which is a default value of > rx-copybreak value too. > The rx-copybreak value hasn't been allowed to change so the > hds_threshold too. > > This patchset decouples hds_threshold and rx-copybreak first. > and make tcp-data-split, rx-copybreak, and > tcp-data-split-thresh(hds_threshold) configurable independently. > > But the default configuration is the same. > The default value of rx-copybreak is 256 and default > tcp-data-split-thresh is also 256. > > There are several related options. > TPA(HW-GRO, LRO), JUMBO, jumbo_thresh(firmware command), and Aggregation > Ring. > > The aggregation ring is fundamental to these all features. > When gro/lro/jumbo packets are received, NIC receives the first packet > from the normal ring. > follow packets come from the aggregation ring. > > These features are working regardless of HDS. > When TPA is enabled and HDS is disabled, the first packet contains > header and payload too. > and the following packets contain payload only. > If HDS is enabled, the first packet contains the header only, and the > following packets contain only payload. > So, HW-GRO/LRO is working regardless of HDS. > > There is another threshold value, which is jumbo_thresh. > This is very similar to hds_thresh, but jumbo thresh doesn't split > header and data. > It just split the first and following data based on length. > When NIC receives 1500 sized packet, and jumbo_thresh is 256(default, but > follows rx-copybreak), > the first data is 256 and the following packet size is 1500-256. > > Before this patch, at least if one of GRO, LRO, and JUMBO flags is > enabled, the Aggregation ring will be enabled. > If the Aggregation ring is enabled, both hds_threshold and > jumbo_thresh are set to the default value of rx-copybreak. > > So, GRO, LRO, JUMBO frames, they larger than 256 bytes, they will > be split into header and data if the protocol is TCP or UDP. > for the other protocol, jumbo_thresh works instead of hds_thresh. > > This means that tcp-data-split relies on the GRO, LRO, and JUMBO flags. > But by this patch, tcp-data-split no longer relies on these flags. > If the tcp-data-split is enabled, the Aggregation ring will be > enabled. > Also, hds_threshold no longer follows rx-copybreak value, it will > be set to the tcp-data-split-thresh value by user-space, but the > default value is still 256. > > If the protocol is TCP or UDP and the HDS is disabled and Aggregation > ring is enabled, a packet will be split into several pieces due to > jumbo_thresh. > > When XDP is attached, tcp-data-split is automatically disabled. > > LRO, GRO, and JUMBO are tested with BCM57414, BCM57504 and the firmware > version is 230.0.157.0. > I couldn't find any specification about minimum and maximum value > of hds_threshold, but from my test result, it was about 0 ~ 1023. > It means, over 1023 sized packets will be split into header and data if > tcp-data-split is enabled regardless of hds_treshold value. > When hds_threshold is 1500 and received packet size is 1400, HDS should > not be activated, but it is activated. > The maximum value of hds_threshold(tcp-data-split-thresh) > value is 256 because it has been working. > It was decided very conservatively. > > I checked out the tcp-data-split(HDS) works independently of GRO, LRO, > JUMBO. Tested GRO/LRO, JUMBO with enabled HDS and disabled HDS. > Also, I checked out tcp-data-split should be disabled automatically > when XDP is attached and disallowed to enable it again while XDP is > attached. I tested ranged values from min to max for > tcp-data-split-thresh and rx-copybreak, and it works. > tcp-data-split-thresh from 0 to 256, and rx-copybreak 65 to 256. > When testing this patchset, I checked skb->data, skb->data_len, and > nr_frags values. > > The first patch implements .{set, get}_tunable() in the bnxt_en. > The bnxt_en driver has been supporting the rx-copybreak feature but is > not configurable, Only the default rx-copybreak value has been working. > So, it changes the bnxt_en driver to be able to configure > the rx-copybreak value. > > The second patch adds an implementation of tcp-data-split ethtool > command. > The HDS relies on the Aggregation ring, which is automatically enabled > when either LRO, GRO, or large mtu is configured. > So, if the Aggregation ring is enabled, HDS is automatically enabled by > it. > > The third patch adds tcp-data-split-thresh command in the ethtool. > This threshold value indicates if a received packet size is larger > than this threshold, the packet's header and payload will be split. > Example: > # ethtool -G <interface name> tcp-data-split-thresh <value> > This option can not be used when tcp-data-split is disabled or not > supported. > # ethtool -G enp14s0f0np0 tcp-data-split on tcp-data-split-thresh 256 > # ethtool -g enp14s0f0np0 > Ring parameters for enp14s0f0np0: > Pre-set maximums: > ... > Current hardware settings: > ... > TCP data split: on > TCP data split thresh: 256 > > # ethtool -G enp14s0f0np0 tcp-data-split off > # ethtool -g enp14s0f0np0 > Ring parameters for enp14s0f0np0: > Pre-set maximums: > ... > Current hardware settings: > ... > TCP data split: off > TCP data split thresh: n/a > > The fourth patch adds the implementation of tcp-data-split-thresh logic > in the bnxt_en driver. > The default value is 256, which used to be the default rx-copybreak > value. > > The fifth and sixth adds condition check for devmem and ethtool. > If tcp-data-split is disabled or threshold value is not zero, setup of > devmem will be failed. > Also, tcp-data-split and tcp-data-split-thresh will not be changed > while devmem is running. > > The last patch implements device memory TCP for bnxt_en driver. > It usually converts generic page_pool api to netmem page_pool api. > > No dependencies exist between device memory TCP and GRO/LRO/MTU. > Only tcp-data-split and tcp-data-split-thresh should be enabled when the > device memory TCP. > While devmem TCP is set, tcp-data-split and tcp-data-split-thresh can't > be updated because core API disallows change. > > I tested the interface up/down while devmem TCP running. It works well. > Also, channel count change, and rx/tx ringsize change tests work well too. > > The devmem TCP test NIC is BCM57504 [..] > All necessary configuration validations exist at the core API level. > > Note that by this patch, the setup of device memory TCP would fail. > Because tcp-data-split-thresh command is not supported by ethtool yet. > The tcp-data-split-thresh should be 0 for setup device memory TCP and > the default of bnxt is 256. > So, for the bnxt, it always fails until ethtool supports > tcp-data-split-thresh command. > > The ncdevmem.c will be updated after ethtool supports > tcp-data-split-thresh option. FYI, I've tested your series with BCM57504 on top of [1] and [2] with a couple of patches to make ncdevmem.c and TX work (see below). [1] decouples ncdevmem from ethtool so we can flip header split settings without requiring recent ethtool. Both RX and TX work perfectly. Feel free to carry: Tested-by: Stanislav Fomichev <sdf@xxxxxxxxxxx> Also feel free to take over the ncdevmem patch if my ncdevmem changes get pulled before your series. 1: https://lore.kernel.org/netdev/20241009171252.2328284-1-sdf@xxxxxxxxxxx/ 2: https://lore.kernel.org/netdev/20240913150913.1280238-1-sdf@xxxxxxxxxxx/ commit 69bc0e247eb4132ef5fd0b118719427d35d462fc Author: Stanislav Fomichev <sdf@xxxxxxxxxxx> AuthorDate: Tue Oct 15 15:56:43 2024 -0700 Commit: Stanislav Fomichev <sdf@xxxxxxxxxxx> CommitDate: Wed Oct 16 13:13:42 2024 -0700 selftests: ncdevmem: Set header split threshold to 0 Needs to happen on BRCM to allow devmem to be attached. Signed-off-by: Stanislav Fomichev <sdf@xxxxxxxxxxx> diff --git a/tools/testing/selftests/drivers/net/hw/ncdevmem.c b/tools/testing/selftests/drivers/net/hw/ncdevmem.c index 903dac3e61d5..6a94d52a6c43 100644 --- a/tools/testing/selftests/drivers/net/hw/ncdevmem.c +++ b/tools/testing/selftests/drivers/net/hw/ncdevmem.c @@ -322,6 +322,8 @@ static int configure_headersplit(bool on) ethtool_rings_set_req_set_header_dev_index(req, ifindex); /* 0 - off, 1 - auto, 2 - on */ ethtool_rings_set_req_set_tcp_data_split(req, on ? 2 : 0); + if (enable) + ethtool_rings_set_req_set_tcp_data_split_thresh(req, 0); ret = ethtool_rings_set(ys, req); if (ret < 0) fprintf(stderr, "YNL failed: %s\n", ys->err.msg); commit ef5ba647bc94a19153c2c5cfc64ebe4cb86ac58d Author: Stanislav Fomichev <sdf@xxxxxxxxxxx> AuthorDate: Fri Oct 11 13:52:03 2024 -0700 Commit: Stanislav Fomichev <sdf@xxxxxxxxxxx> CommitDate: Wed Oct 16 13:13:42 2024 -0700 bnxt_en: support tx device memory The only change is to not unmap the frags on completions. Signed-off-by: Stanislav Fomichev <sdf@xxxxxxxxxxx> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 6e422e24750a..cb22707a35aa 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -692,7 +692,10 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev) goto tx_dma_error; tx_buf = &txr->tx_buf_ring[RING_TX(bp, prod)]; - dma_unmap_addr_set(tx_buf, mapping, mapping); + if (netmem_is_net_iov(frag->netmem)) + dma_unmap_addr_set(tx_buf, mapping, 0); + else + dma_unmap_addr_set(tx_buf, mapping, mapping); txbd->tx_bd_haddr = cpu_to_le64(mapping); @@ -749,9 +752,10 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev) for (i = 0; i < last_frag; i++) { prod = NEXT_TX(prod); tx_buf = &txr->tx_buf_ring[RING_TX(bp, prod)]; - dma_unmap_page(&pdev->dev, dma_unmap_addr(tx_buf, mapping), - skb_frag_size(&skb_shinfo(skb)->frags[i]), - DMA_TO_DEVICE); + if (dma_unmap_addr(tx_buf, mapping)) + dma_unmap_page(&pdev->dev, dma_unmap_addr(tx_buf, mapping), + skb_frag_size(&skb_shinfo(skb)->frags[i]), + DMA_TO_DEVICE); } tx_free: @@ -821,11 +825,12 @@ static bool __bnxt_tx_int(struct bnxt *bp, struct bnxt_tx_ring_info *txr, for (j = 0; j < last; j++) { cons = NEXT_TX(cons); tx_buf = &txr->tx_buf_ring[RING_TX(bp, cons)]; - dma_unmap_page( - &pdev->dev, - dma_unmap_addr(tx_buf, mapping), - skb_frag_size(&skb_shinfo(skb)->frags[j]), - DMA_TO_DEVICE); + if (dma_unmap_addr(tx_buf, mapping)) + dma_unmap_page( + &pdev->dev, + dma_unmap_addr(tx_buf, mapping), + skb_frag_size(&skb_shinfo(skb)->frags[j]), + DMA_TO_DEVICE); } if (unlikely(is_ts_pkt)) { if (BNXT_CHIP_P5(bp)) { @@ -3296,10 +3301,11 @@ static void bnxt_free_tx_skbs(struct bnxt *bp) skb_frag_t *frag = &skb_shinfo(skb)->frags[k]; tx_buf = &txr->tx_buf_ring[ring_idx]; - dma_unmap_page( - &pdev->dev, - dma_unmap_addr(tx_buf, mapping), - skb_frag_size(frag), DMA_TO_DEVICE); + if (dma_unmap_addr(tx_buf, mapping)) + dma_unmap_page( + &pdev->dev, + dma_unmap_addr(tx_buf, mapping), + skb_frag_size(frag), DMA_TO_DEVICE); } dev_kfree_skb(skb); }