On 13/06/2024 13:49, Niklas Söderlund wrote: > Hi Paul, > > Thanks for your feedback. > > On 2024-06-13 13:29:30 +0100, Paul Barker wrote: >> On 13/06/2024 11:41, Niklas Söderlund wrote: >>> Add initial support for Renesas Ethernet-TSN End-station device of R-Car >>> V4H. The Ethernet End-station can connect to an Ethernet network using a >>> 10 Mbps, 100 Mbps, or 1 Gbps full-duplex link via MII/GMII/RMII/RGMII. >>> Depending on the connected PHY. >>> >>> The driver supports Rx checksum and offload and hardware timestamps. >>> >>> While full power management and suspend/resume is not yet supported the >>> driver enables runtime PM in order to enable the module clock. While >>> explicit clock management using clk_enable() would suffice for the >>> supported SoC, the module could be reused on SoCs where the module is >>> part of a power domain. >>> >>> Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@xxxxxxxxxxxx> >>> --- >>> * Changes since v7 >>> - Properly handle Rx when netpoll is called with 0 budget. Previously >>> one Rx packet would be processed in this case. >>> - Use napi_alloc_skb() instead of netdev_alloc_skb() in Rx. >>> - Use device implemented 64 bit stats instead of using stats member in >>> struct net_device. >>> - Implement ndo_hwtstamp_{g,s}et instead of acting on SIOC{G,S}HWTSTAMP >>> in do_ioctl(). >>> - Fix incorrect error sign when using PTR_ERR(). >>> - Add missing cosmetic space when creating struct of_device_id. >>> - Rename label error_alloc to error_free. >>> >>> * Changes since v6 >>> - Fix warning added when removing delays depending on phy-mode logic. >>> >>> * Changes since v5 >>> - Removed delays depending on phy-mode logic. This is only needed on >>> some SoCs (V4H multiple boards), and currently we can't test on them. >>> As this have been a hot topic in review drop it for now so we can >>> support V4H single boards which we can test and lets work on the delay >>> on-top when we can test it. >>> >>> * Changes since v4 >>> - Enable GPOUT_RDM and GPOUT_TDM delays depending on phy-mode. >>> >>> * Changes since v3 >>> - Add description to commit message why PM operations are used instead >>> of explicit management of the clock. >>> >>> * Changes since v2 >>> - Drop extra call to ether_setup(), already called by >>> alloc_etherdev_mqs(). >>> - Remove dependency on MII. >>> - Improve error paths when requestion IRQs. >>> - Correct the interpretation of which phy-mode to apply delays for, and >>> mask the phy-mode passed to the PHY if MAC adds delays. >>> >>> * Changes since v1 >>> - Remove C45 read/write functions, the phy-core can do that for us. >>> - Rework the driver to register the MDIO bus at probe time instead of at >>> ndo_open(). >>> - Rework rtsn_rx() to take advantages of the improved design proposed >>> and upstreamed for R-Car RAVB driver Rx code-path. >>> - Use napi_complete_done() instead of napi_complete(). >>> - Update commit message to list that the driver supports Rx CSUM >>> offload. >>> - Drop unneeded __iowmb() left from development as well as a stray ;. >>> - Consider all RGMII modes. >>> - Move the phy_stop() call to mirror where phy_start() is called. >>> - Forward IOCTLS from rtsn_do_ioctl() to PHY using ndo_eth_ioctl() >>> - Align variable names to match core for IOCTLS functions. >>> - Make sure DMA mask and PTP is registered before registering the ndev. >>> - Document why the RTSN driver needs to be able to apply delays >>> - Add checks for which phy-modes the MAC driver can apply delays >>> - Use snprintf() instead of sprintf() >>> >>> * Changes since RFC >>> - Fix issues in MDIO communication. >>> - Use a dedicated OF node for the MDIO bus. >>> --- >>> MAINTAINERS | 8 + >>> drivers/net/ethernet/renesas/Kconfig | 10 + >>> drivers/net/ethernet/renesas/Makefile | 2 + >>> drivers/net/ethernet/renesas/rtsn.c | 1381 +++++++++++++++++++++++++ >>> drivers/net/ethernet/renesas/rtsn.h | 464 +++++++++ >>> 5 files changed, 1865 insertions(+) >>> create mode 100644 drivers/net/ethernet/renesas/rtsn.c >>> create mode 100644 drivers/net/ethernet/renesas/rtsn.h >> >> [snip] >> >>> +static int rtsn_rx(struct net_device *ndev, int budget) >>> +{ >>> + struct rtsn_private *priv = netdev_priv(ndev); >>> + unsigned int ndescriptors; >>> + unsigned int rx_packets; >>> + unsigned int i; >>> + bool get_ts; >>> + >>> + get_ts = priv->ptp_priv->tstamp_rx_ctrl & >>> + RCAR_GEN4_RXTSTAMP_TYPE_V2_L2_EVENT; >>> + >>> + ndescriptors = priv->dirty_rx + priv->num_rx_ring - priv->cur_rx; >>> + rx_packets = 0; >>> + for (i = 0; i < ndescriptors; i++) { >>> + const unsigned int entry = priv->cur_rx % priv->num_rx_ring; >>> + struct rtsn_ext_ts_desc *desc = &priv->rx_ring[entry]; >>> + struct sk_buff *skb; >>> + dma_addr_t dma_addr; >>> + u16 pkt_len; >>> + >>> + /* Stop processing descriptors if budget is consumed. */ >>> + if (rx_packets >= budget) >>> + break; >>> + >>> + /* Stop processing descriptors on first empty. */ >>> + if ((desc->die_dt & DT_MASK) == DT_FEMPTY) >>> + break; >>> + >>> + dma_rmb(); >>> + pkt_len = le16_to_cpu(desc->info_ds) & RX_DS; >>> + >>> + skb = priv->rx_skb[entry]; >>> + priv->rx_skb[entry] = NULL; >>> + dma_addr = le32_to_cpu(desc->dptr); >>> + dma_unmap_single(ndev->dev.parent, dma_addr, PKT_BUF_SZ, >>> + DMA_FROM_DEVICE); >>> + >>> + /* Get timestamp if enabled. */ >>> + if (get_ts) { >>> + struct skb_shared_hwtstamps *shhwtstamps; >>> + struct timespec64 ts; >>> + >>> + shhwtstamps = skb_hwtstamps(skb); >>> + memset(shhwtstamps, 0, sizeof(*shhwtstamps)); >>> + >>> + ts.tv_sec = (u64)le32_to_cpu(desc->ts_sec); >>> + ts.tv_nsec = le32_to_cpu(desc->ts_nsec & cpu_to_le32(0x3fffffff)); >>> + >>> + shhwtstamps->hwtstamp = timespec64_to_ktime(ts); >>> + } >>> + >>> + skb_put(skb, pkt_len); >>> + skb->protocol = eth_type_trans(skb, ndev); >>> + netif_receive_skb(skb); >>> + >>> + /* Update statistics. */ >>> + priv->stats.rx_packets++; >> >> I think it's better to do `priv->stats.rx_packets += rx_packets` once at >> the end of this function instead of repeatedly incrementing another >> variable on the hot path. > > We could do that, but I opted for this as it makes more sens to update > rx_packets and rx_bytes in the same location IMHO. > Ok, I have no strong opinion here. >> >>> + priv->stats.rx_bytes += pkt_len; >>> + >>> + /* Update counters. */ >>> + priv->cur_rx++; >>> + rx_packets++; >>> + } >>> + >>> + /* Refill the RX ring buffers */ >>> + for (; priv->cur_rx - priv->dirty_rx > 0; priv->dirty_rx++) { >>> + const unsigned int entry = priv->dirty_rx % priv->num_rx_ring; >>> + struct rtsn_ext_ts_desc *desc = &priv->rx_ring[entry]; >>> + struct sk_buff *skb; >>> + dma_addr_t dma_addr; >>> + >>> + desc->info_ds = cpu_to_le16(PKT_BUF_SZ); >>> + >>> + if (!priv->rx_skb[entry]) { >>> + skb = napi_alloc_skb(&priv->napi, >>> + PKT_BUF_SZ + RTSN_ALIGN - 1); >>> + if (!skb) >>> + break; >>> + skb_reserve(skb, NET_IP_ALIGN); >>> + dma_addr = dma_map_single(ndev->dev.parent, skb->data, >>> + le16_to_cpu(desc->info_ds), >>> + DMA_FROM_DEVICE); >>> + if (dma_mapping_error(ndev->dev.parent, dma_addr)) >>> + desc->info_ds = cpu_to_le16(0); >>> + desc->dptr = cpu_to_le32(dma_addr); >>> + skb_checksum_none_assert(skb); >>> + priv->rx_skb[entry] = skb; >>> + } >>> + >>> + dma_wmb(); >>> + desc->die_dt = DT_FEMPTY | D_DIE; >>> + } >>> + >>> + priv->rx_ring[priv->num_rx_ring].die_dt = DT_LINK; >>> + >>> + return rx_packets; >>> +} >> >> [snip] >> >>> +static netdev_tx_t rtsn_start_xmit(struct sk_buff *skb, struct net_device *ndev) >>> +{ >>> + struct rtsn_private *priv = netdev_priv(ndev); >>> + struct rtsn_ext_desc *desc; >>> + int ret = NETDEV_TX_OK; >>> + unsigned long flags; >>> + dma_addr_t dma_addr; >>> + int entry; >>> + >>> + spin_lock_irqsave(&priv->lock, flags); >>> + >>> + if (priv->cur_tx - priv->dirty_tx > priv->num_tx_ring) { >>> + netif_stop_subqueue(ndev, 0); >>> + ret = NETDEV_TX_BUSY; >>> + goto out; >>> + } >>> + >>> + if (skb_put_padto(skb, ETH_ZLEN)) >>> + goto out; >>> + >>> + dma_addr = dma_map_single(ndev->dev.parent, skb->data, skb->len, >>> + DMA_TO_DEVICE); >>> + if (dma_mapping_error(ndev->dev.parent, dma_addr)) { >>> + dev_kfree_skb_any(skb); >>> + goto out; >>> + } >>> + >>> + entry = priv->cur_tx % priv->num_tx_ring; >>> + priv->tx_skb[entry] = skb; >>> + desc = &priv->tx_ring[entry]; >>> + desc->dptr = cpu_to_le32(dma_addr); >>> + desc->info_ds = cpu_to_le16(skb->len); >> >> Should we check against the maximum TX frame size supported by the >> hardware here? >> >> Whatever we do here, we should also do in the ravb driver as that makes >> a similar cpu_to_le16() call to fill the DS field with no check that the >> HW actually supports transmitting a frame of the given size. > > Compared to RAVB the RTSN driver do not support splitting a packet over > multiple descriptors, so the max frame size adhering to the MTU will > always fit using a single descriptor. > My concern here is with pathological or malicious packets. For example, you can use stacked VLANS (QinQ, QinQinQ, etc) to expand the size of the TX frame for a given MTU since the bytes used by the extra VLAN tags are not counted as payload bytes. At least with the RZ/G2L family, no verification has been performed on sending packets larger than 1526 bytes to my knowledge. Even using only one TX descriptor, I was able to completely lock up the GbEth IP by pushing a 2kB frame into the TX queue. So I do think it is worth adding some checks here. Thanks, -- Paul Barker
Attachment:
OpenPGP_0x27F4B3459F002257.asc
Description: OpenPGP public key
Attachment:
OpenPGP_signature.asc
Description: OpenPGP digital signature