On 05/12/2024 10:57, Jon Hunter wrote:
On 04/12/2024 18:18, Thierry Reding wrote:On Wed, Dec 04, 2024 at 05:45:43PM +0000, Russell King (Oracle) wrote:On Wed, Dec 04, 2024 at 05:02:19PM +0000, Jon Hunter wrote:Hi Russell, On 04/12/2024 16:39, Russell King (Oracle) wrote:On Wed, Dec 04, 2024 at 04:58:34PM +0100, Thierry Reding wrote:This doesn't match the location from earlier, but at least there'ssomething afoot here that needs fixing. I suppose this could simply be hiding any subsequent errors, so once this is fixed we might see othersimilar issues.Well, having a quick look at this, the first thing which stands out is:In stmmac_tx_clean(), we have: if (likely(tx_q->tx_skbuff_dma[entry].buf &&tx_q->tx_skbuff_dma[entry].buf_type != STMMAC_TXBUF_T_XDP_TX)) { if (tx_q->tx_skbuff_dma[entry].map_as_page) dma_unmap_page(priv->device,tx_q- >tx_skbuff_dma[entry].buf, tx_q- >tx_skbuff_dma[entry].len,DMA_TO_DEVICE); else dma_unmap_single(priv->device,tx_q- >tx_skbuff_dma[entry].buf, tx_q- >tx_skbuff_dma[entry].len,DMA_TO_DEVICE); tx_q->tx_skbuff_dma[entry].buf = 0; tx_q->tx_skbuff_dma[entry].len = 0;tx_q->tx_skbuff_dma[entry].map_as_page = false;} So, tx_skbuff_dma[entry].buf is expected to point appropriately to the DMA region. Now if we look at stmmac_tso_xmit():des = dma_map_single(priv->device, skb->data, skb_headlen(skb),DMA_TO_DEVICE); if (dma_mapping_error(priv->device, des)) goto dma_map_err; if (priv->dma_cap.addr64 <= 32) { ... } else { ... des += proto_hdr_len; ... } tx_q->tx_skbuff_dma[tx_q->cur_tx].buf = des; tx_q->tx_skbuff_dma[tx_q->cur_tx].len = skb_headlen(skb); tx_q->tx_skbuff_dma[tx_q->cur_tx].map_as_page = false;tx_q->tx_skbuff_dma[tx_q->cur_tx].buf_type = STMMAC_TXBUF_T_SKB;This will result in stmmac_tx_clean() calling dma_unmap_single() using "des" and "skb_headlen(skb)" as the buffer start and length. One of the requirements of the DMA mapping API is that the DMA handle returned by the map operation will be passed into the unmap function. Not something that was offset. The length will also be the same. We can clearly see above that there is a case where the DMA handle has been offset by proto_hdr_len, and when this is so, the value that is passed into the unmap operation no longer matches this requirement. So, a question to the reporter - what is the value of priv->dma_cap.addr64 in your failing case? You should see the value in the "Using %d/%d bits DMA host/device width" kernel message.It is ...dwc-eth-dwmac 2490000.ethernet: Using 40/40 bits DMA host/device widthSo yes, "des" is being offset, which will upset the unmap operation. Please try the following patch, thanks:diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/ drivers/net/ethernet/stmicro/stmmac/stmmac_main.cindex 9b262cdad60b..c81ea8cdfe6e 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c@@ -4192,8 +4192,8 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)struct stmmac_txq_stats *txq_stats; struct stmmac_tx_queue *tx_q; u32 pay_len, mss, queue; + dma_addr_t tso_des, des; u8 proto_hdr_len, hdr; - dma_addr_t des; bool set_ic; int i;@@ -4289,14 +4289,15 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev) /* If needed take extra descriptors to fill the remaining payload */tmp_pay_len = pay_len - TSO_MAX_BUFF_SIZE; + tso_des = des; } else { stmmac_set_desc_addr(priv, first, des); tmp_pay_len = pay_len; - des += proto_hdr_len; + tso_des = des + proto_hdr_len; pay_len = 0; } - stmmac_tso_allocator(priv, des, tmp_pay_len, (nfrags == 0), queue);+ stmmac_tso_allocator(priv, tso_des, tmp_pay_len, (nfrags == 0), queue); /* In case two or more DMA transmit descriptors are allocated for this* non-paged SKB data, the DMA buffer address should be saved toI see, that makes sense. Looks like this has been broken for a few years (since commit 34c15202896d ("net: stmmac: Fix the problem of tso_xmit")) and Furong's patch ended up exposing it. Anyway, this seems to fix it for me. I can usually trigger the issue within one or two iperf runs, with your patch I haven't seen it break after a dozen or so runs. It may be good to have Jon's test results as well, but looks good so far.I have been running tests on my side and so far so good too. I have not seen any more mapping failure cases.Russell, if you are planning to send a fix for this, please add my ... Tested-by: Jon Hunter <jonathanh@xxxxxxxxxx>
Nevermind I see Furong already sent a fix. Jon -- nvpublic