Hi Serge, Thanks for the feedback. On Thu, Oct 10, 2024 at 4:55 PM Serge Semin <fancer.lancer@xxxxxxxxx> wrote: > > On Mon, Sep 16, 2024 at 04:32:33PM GMT, Jitendra Vegiraju wrote: > > Hi Serge, > > > > On Tue, Sep 10, 2024 at 12:25 PM Serge Semin <fancer.lancer@xxxxxxxxx> wrote: > > > > > > > +static u32 decode_vdma_count(u32 regval) > > > > +{ > > > > + /* compressed encoding for vdma count > > > > + * regval: VDMA count > > > > + * 0-15 : 1 - 16 > > > > + * 16-19 : 20, 24, 28, 32 > > > > + * 20-23 : 40, 48, 56, 64 > > > > + * 24-27 : 80, 96, 112, 128 > > > > + */ > > > > + if (regval < 16) > > > > + return regval + 1; > > > > + return (4 << ((regval - 16) / 4)) * ((regval % 4) + 5); > > > > > > The shortest code isn't always the best one. This one gives me a > > > headache in trying to decipher whether it really matches to what is > > > described in the comment. What about just: > > > > > > if (regval < 16) /* Direct mapping */ > > > return regval + 1; > > > else if (regval < 20) /* 20, 24, 28, 32 */ > > > return 20 + (regval - 16) * 4; > > > else if (regval < 24) /* 40, 48, 56, 64 */ > > > return 40 + (regval - 20) * 8; > > > else if (regval < 28) /* 80, 96, 112, 128 */ > > > return 80 + (regval - 24) * 16; > > > > > > ? > > Couldn't agree more :) > > Thanks, I will replace it with your code, which is definitely more readable. > > > > > > > > > +} > > > > + > > > > +static void dw25gmac_read_hdma_limits(void __iomem *ioaddr, > > > > + struct stmmac_hdma_cfg *hdma) > > > > +{ > > > > + u32 hw_cap; > > > > + > > > > + /* Get VDMA/PDMA counts from HW */ > > > > + hw_cap = readl(ioaddr + XGMAC_HW_FEATURE2); > > > > > > > > > > + hdma->tx_vdmas = decode_vdma_count(FIELD_GET(XXVGMAC_HWFEAT_VDMA_TXCNT, > > > > + hw_cap)); > > > > + hdma->rx_vdmas = decode_vdma_count(FIELD_GET(XXVGMAC_HWFEAT_VDMA_RXCNT, > > > > + hw_cap)); > > > > + hdma->tx_pdmas = FIELD_GET(XGMAC_HWFEAT_TXQCNT, hw_cap) + 1; > > > > + hdma->rx_pdmas = FIELD_GET(XGMAC_HWFEAT_RXQCNT, hw_cap) + 1; > > > > > > Hmm, these are the Tx/Rx DMA-channels and Tx/Rx MTL-queues count > > > fields. Can't you just use the > > > dma_features::{number_tx_channel,number_tx_queues} and > > > dma_features::{number_rx_channel,number_rx_queues} fields to store the > > > retrieved data? > > > > > > Moreover why not to add the code above to the dwxgmac2_get_hw_feature() method? > > > > > Thanks, I missed the reuse of existing fields. > > > However, since the VDMA count has a slightly bigger bitmask, we need to extract > > VDMA channel count as per DW25GMAC spec. > > Instead of duplicating dwxgmac2_get_hw_feature(), should we add wrapper for > > dw25gmac, something like the following? > > Yeah, and there is the encoded fields value. Your suggestion sounds > reasonable. > > > dw25gmac_get_hw_feature(ioaddr, dma_cap) > > { > > u32 hw_cap; > > > int rc; > > s/rc/ret > > + newline > > > rc = dwxgmac2_get_hw_feature(ioaddr, dma_cap); > > newline > > > /* Get VDMA counts from HW */ > > hw_cap = readl(ioaddr + XGMAC_HW_FEATURE2); > > dma_cap->num_tx_channels = > > decode_vdma_count(FIELD_GET(XXVGMAC_HWFEAT_VDMA_TXCNT, > > hw_cap)); > > dma_cap->num_rx_channels = > > decode_vdma_count(FIELD_GET(XXVGMAC_HWFEAT_VDMA_RXCNT, > > hw_cap)); > > newline > Ack > > return rc; > > } > > > > > > +} > > > > + > > > > +int dw25gmac_hdma_cfg_init(struct stmmac_priv *priv) > > > > +{ > > > > + struct plat_stmmacenet_data *plat = priv->plat; > > > > + struct device *dev = priv->device; > > > > + struct stmmac_hdma_cfg *hdma; > > > > + int i; > > > > + > > > > + hdma = devm_kzalloc(dev, > > > > + sizeof(*plat->dma_cfg->hdma_cfg), > > > > + GFP_KERNEL); > > > > + if (!hdma) > > > > + return -ENOMEM; > > > > + > > > > + dw25gmac_read_hdma_limits(priv->ioaddr, hdma); > > > > + > > > > + hdma->tvdma_tc = devm_kzalloc(dev, > > > > + sizeof(*hdma->tvdma_tc) * hdma->tx_vdmas, > > > > + GFP_KERNEL); > > > > + if (!hdma->tvdma_tc) > > > > + return -ENOMEM; > > > > + > > > > + hdma->rvdma_tc = devm_kzalloc(dev, > > > > + sizeof(*hdma->rvdma_tc) * hdma->rx_vdmas, > > > > + GFP_KERNEL); > > > > + if (!hdma->rvdma_tc) > > > > + return -ENOMEM; > > > > + > > > > + hdma->tpdma_tc = devm_kzalloc(dev, > > > > + sizeof(*hdma->tpdma_tc) * hdma->tx_pdmas, > > > > + GFP_KERNEL); > > > > + if (!hdma->tpdma_tc) > > > > + return -ENOMEM; > > > > + > > > > + hdma->rpdma_tc = devm_kzalloc(dev, > > > > + sizeof(*hdma->rpdma_tc) * hdma->rx_pdmas, > > > > + GFP_KERNEL); > > > > + if (!hdma->rpdma_tc) > > > > + return -ENOMEM; > > > > + > > > > > > > + /* Initialize one-to-one mapping for each of the used queues */ > > > > + for (i = 0; i < plat->tx_queues_to_use; i++) { > > > > + hdma->tvdma_tc[i] = i; > > > > + hdma->tpdma_tc[i] = i; > > > > + } > > > > + for (i = 0; i < plat->rx_queues_to_use; i++) { > > > > + hdma->rvdma_tc[i] = i; > > > > + hdma->rpdma_tc[i] = i; > > > > + } > > > > > > So the Traffic Class ID is initialized for the > > > tx_queues_to_use/rx_queues_to_use number of channels only, right? What > > > about the Virtual and Physical DMA-channels with numbers greater than > > > these values? > > > > > > You have brought up a question that applies to remaining comments in > > this file as well. > > How the VDMA/PDMA mapping is used depends on the device/glue driver. > > For example in > > our SoC the remaining VDMAs are meant to be used with SRIOV virtual > > functions and not > > all of them are available for physical function. > > Since additional VDMAs/PDMAs remain unused in hardware I let them stay at their > > default values. No traffic is expected to be mapped to unused V/PDMAs. > > I couldn't think of a reason for it to be an issue from a driver perspective. > > Please let me know, if I am missing something or we need to address a > > use case with bigger scope. > > The responses for following comments also depend on what approach we take here. > > If they are unused, then I suggest to follow the KISS principle. See > my last comment for details. > > > > > > > + plat->dma_cfg->hdma_cfg = hdma; > > > > + > > > > + return 0; > > > > +} > > > > + > > > > + > > > > +void dw25gmac_dma_init(void __iomem *ioaddr, > > > > + struct stmmac_dma_cfg *dma_cfg) > > > > +{ > > > > + u32 value; > > > > + u32 i; > > > > + > > > > + value = readl(ioaddr + XGMAC_DMA_SYSBUS_MODE); > > > > + value &= ~(XGMAC_AAL | XGMAC_EAME); > > > > + if (dma_cfg->aal) > > > > + value |= XGMAC_AAL; > > > > + if (dma_cfg->eame) > > > > + value |= XGMAC_EAME; > > > > + writel(value, ioaddr + XGMAC_DMA_SYSBUS_MODE); > > > > + > > > > + for (i = 0; i < dma_cfg->hdma_cfg->tx_vdmas; i++) { > > > > + value = rd_dma_ch_ind(ioaddr, MODE_TXDESCCTRL, i); > > > > + value &= ~XXVGMAC_TXDCSZ; > > > > + value |= FIELD_PREP(XXVGMAC_TXDCSZ, > > > > + XXVGMAC_TXDCSZ_256BYTES); > > > > + value &= ~XXVGMAC_TDPS; > > > > + value |= FIELD_PREP(XXVGMAC_TDPS, XXVGMAC_TDPS_HALF); > > > > + wr_dma_ch_ind(ioaddr, MODE_TXDESCCTRL, i, value); > > > > + } > > > > + > > > > + for (i = 0; i < dma_cfg->hdma_cfg->rx_vdmas; i++) { > > > > + value = rd_dma_ch_ind(ioaddr, MODE_RXDESCCTRL, i); > > > > + value &= ~XXVGMAC_RXDCSZ; > > > > + value |= FIELD_PREP(XXVGMAC_RXDCSZ, > > > > + XXVGMAC_RXDCSZ_256BYTES); > > > > + value &= ~XXVGMAC_RDPS; > > > > + value |= FIELD_PREP(XXVGMAC_TDPS, XXVGMAC_RDPS_HALF); > > > > + wr_dma_ch_ind(ioaddr, MODE_RXDESCCTRL, i, value); > > > > + } > > > > + > > > > > > > + for (i = 0; i < dma_cfg->hdma_cfg->tx_pdmas; i++) { > > > > + value = rd_dma_ch_ind(ioaddr, MODE_TXEXTCFG, i); > > > > + value &= ~(XXVGMAC_TXPBL | XXVGMAC_TPBLX8_MODE); > > > > + if (dma_cfg->pblx8) > > > > + value |= XXVGMAC_TPBLX8_MODE; > > > > + value |= FIELD_PREP(XXVGMAC_TXPBL, dma_cfg->pbl); > > > > + wr_dma_ch_ind(ioaddr, MODE_TXEXTCFG, i, value); > > > > + xgmac4_tp2tc_map(ioaddr, i, dma_cfg->hdma_cfg->tpdma_tc[i]); > > > > + } > > > > + > > > > + for (i = 0; i < dma_cfg->hdma_cfg->rx_pdmas; i++) { > > > > + value = rd_dma_ch_ind(ioaddr, MODE_RXEXTCFG, i); > > > > + value &= ~(XXVGMAC_RXPBL | XXVGMAC_RPBLX8_MODE); > > > > + if (dma_cfg->pblx8) > > > > + value |= XXVGMAC_RPBLX8_MODE; > > > > + value |= FIELD_PREP(XXVGMAC_RXPBL, dma_cfg->pbl); > > > > + wr_dma_ch_ind(ioaddr, MODE_RXEXTCFG, i, value); > > > > + xgmac4_rp2tc_map(ioaddr, i, dma_cfg->hdma_cfg->rpdma_tc[i]); > > > > > > What if tx_pdmas doesn't match plat_stmmacenet_data::tx_queues_to_use > > > and rx_pdmas doesn't match to plat_stmmacenet_data::rx_queues_to_use? > > > > > > If they don't then you'll get out of the initialized tpdma_tc/rpdma_tc > > > fields and these channels will be pre-initialized with the zero TC. Is > > > that what expected? I doubt so. > > > > > As mentioned in the previous response the remaining resources are unused > > and no traffic is mapped to those resources. > > > > > > + } > > > > +} > > > > + > > > > > > > +void dw25gmac_dma_init_tx_chan(struct stmmac_priv *priv, > > > > + void __iomem *ioaddr, > > > > + struct stmmac_dma_cfg *dma_cfg, > > > > + dma_addr_t dma_addr, u32 chan) > > > > +{ > > > > + u32 value; > > > > + > > > > > > > + value = readl(ioaddr + XGMAC_DMA_CH_TX_CONTROL(chan)); > > > > + value &= ~XXVGMAC_TVDMA2TCMP; > > > > + value |= FIELD_PREP(XXVGMAC_TVDMA2TCMP, > > > > + dma_cfg->hdma_cfg->tvdma_tc[chan]); > > > > + writel(value, ioaddr + XGMAC_DMA_CH_TX_CONTROL(chan)); > > > > > > Please note this will have only first > > > plat_stmmacenet_data::{tx_queues_to_use,rx_queues_to_use} VDMA > > > channels initialized. Don't you have much more than just 4 channels? > > > > > Yes, there are 32 VDMA channels on this device. In our application the > > additional channels are partitioned for use with SRIOV virtual functions. > > Similar to PDMA comment above, the additional VDMAs are not enabled, > > and left in default state. > > My thinking is, when another 25gmac device comes along that requires a > > different mapping we may need to add the ability to set the mapping in > > glue driver. > > We can support this by adding a check in dw25gmac_setup() > > @@ -1708,8 +1708,10 @@ int dw25gmac_setup(struct stmmac_priv *priv) > > mac->mii.clk_csr_shift = 19; > > mac->mii.clk_csr_mask = GENMASK(21, 19); > > > > - /* Allocate and initialize hdma mapping */ > > - return dw25gmac_hdma_cfg_init(priv); > > + /* Allocate and initialize hdma mapping, if not done by glue driver. */ > > + if (!priv->plat->dma_cfg->hdma_cfg) > > + return dw25gmac_hdma_cfg_init(priv); > > + return 0; > > } > > So the use-case is actually hypothetical. Then I don't see a point in > adding the stmmac_hdma_cfg structure. See my last comment for details. > Yes, It's better to not over-complicate the patch for the hypothetical case. I will remove the new struct in next update. > > > > > > + > > > > + writel(upper_32_bits(dma_addr), > > > > + ioaddr + XGMAC_DMA_CH_TxDESC_HADDR(chan)); > > > > + writel(lower_32_bits(dma_addr), > > > > + ioaddr + XGMAC_DMA_CH_TxDESC_LADDR(chan)); > > > > +} > > > > + > > > > +void dw25gmac_dma_init_rx_chan(struct stmmac_priv *priv, > > > > + void __iomem *ioaddr, > > > > + struct stmmac_dma_cfg *dma_cfg, > > > > + dma_addr_t dma_addr, u32 chan) > > > > +{ > > > > + u32 value; > > > > + > > > > > > > + value = readl(ioaddr + XGMAC_DMA_CH_RX_CONTROL(chan)); > > > > + value &= ~XXVGMAC_RVDMA2TCMP; > > > > + value |= FIELD_PREP(XXVGMAC_RVDMA2TCMP, > > > > + dma_cfg->hdma_cfg->rvdma_tc[chan]); > > > > + writel(value, ioaddr + XGMAC_DMA_CH_RX_CONTROL(chan)); > > > > > > The same question. > > > > > > > + > > > > + writel(upper_32_bits(dma_addr), > > > > + ioaddr + XGMAC_DMA_CH_RxDESC_HADDR(chan)); > > > > + writel(lower_32_bits(dma_addr), > > > > + ioaddr + XGMAC_DMA_CH_RxDESC_LADDR(chan)); > > > > +} > > > > > > These methods are called for each > > > plat_stmmacenet_data::{tx_queues_to_use,rx_queues_to_use} > > > DMA-channel/Queue. The static mapping means you'll have each > > > PDMA/Queue assigned a static traffic class ID corresponding to the > > > channel ID. Meanwhile the VDMA channels are supposed to be initialized > > > with the TC ID corresponding to the matching PDMA ID. > > > > > > The TC ID in this case is passed as the DMA/Queue channel ID. Then the > > > Tx/Rx DMA-channels init methods can be converted to: > > > > > > dw25gmac_dma_init_Xx_chan(chan) > > > { > > > /* Map each chan-th VDMA to the single chan PDMA by assigning > > > * the static TC ID. > > > */ > > > for (i = chan; i < Xx_vdmas; i += (Xx_vdmas / Xx_queues_to_use)) { > > > /* Initialize VDMA channels */ > > > XXVGMAC_TVDMA2TCMP = chan; > > > } > > > > > > /* Assign the static TC ID to the specified PDMA channel */ > > > xgmac4_rp2tc_map(chan, chan) > > > } > > > > > > , where X={t,r}. > > > > > > Thus you can redistribute the loops implemented in dw25gmac_dma_init() > > > to the respective Tx/Rx DMA-channel init methods. > > > > > > Am I missing something? > > I think your visualization of HDMA may be going beyond the application > > I understand. > > We are allocating a VDMA for each of the TX/RX channels. The use of > > additional VDMAs > > depends on how the device is partitioned for virtualization. > > In the non-SRIOV case the remaining VDMAs will remain unused. > > Please let me know if I missed your question. > > So you say that you need a 1:1 mapping between > First-VDMAs:PDMAs/Queues, and there are only > tx_queues_to_use/rx_queues_to_use pairs utilized. Right? If so I don't > really see a need in implementing the overcomplicated solution you > suggest, especially seeing the core driver already supports an > infrastructure for the DMA-Queue managing: > 1. Rx path: dwxgmac2_map_mtl_to_dma() - set VDMA-Rx-Queue TC > dwxgmac2_rx_queue_enable() > 2. Tx path: dwxgmac2_dma_tx_mode() - set VDMA-Tx-Queue TC > > In the first case the mapping can be changed via the MTL DT-nodes. In > the second case the mapping is one-on-one static. Thus you'll be able > to keep the dw25gmac_dma_init_tx_chan()/dw25gmac_dma_init_rx_chan() > method simple initializing the PBL/Descriptor/etc-related stuff only, > as it's done for the DW QoS Eth and XGMAC/XLGMAC IP-cores. You won't > need to introduce a new structure stmmac_hdma_cfg, especially either > seeing it is anyway redundant for your use-case. > > BTW could you clarify: > > 1. is the TCes specified for the VDMA/PDMA-queue mapping the same as > the TCs assigned to the Tx-Queues in the MTL_TxQ0_Operation_Mode > register? If they aren't, then what is the relation between them? > In register documentation for HDMA XGMAC IP, the Q2TCMAP field in MTL_TxQ0_Operation_Mode is marked as *reserved" and is ignored. The VDMA->TC->PDMA mapping is used for DMA scheduling. This static mapping can be done in dw25gmac_dma_init_tx_chan() and dw25gmac_dma_init_rx_chan(). > 2. Similarly, if there is a TC-based Rx VDMA/Queue mapping, then > what's the function of the MTL_RxQ_DMA_MAP* registers? In the RX direction MTL_RxQ_DMA_MAP* decides the VDMA channel for a given RXQ. The TC for VDMA channel is decided by VDMA/TC mapping. Then TC to PDMA mapping is used to pick PDMA for actual DMA operation. > > -Serge(y) > > > > > > > -Serge() > > > > > > > [...]