On Sun, May 15, 2016 at 11:19:53AM +0200, Francois Romieu wrote: > > static void arc_emac_tx_clean(struct net_device *ndev) > { > [...] > for (i = 0; i < TX_BD_NUM; i++) { > unsigned int *txbd_dirty = &priv->txbd_dirty; > struct arc_emac_bd *txbd = &priv->txbd[*txbd_dirty]; > struct buffer_state *tx_buff = &priv->tx_buff[*txbd_dirty]; > struct sk_buff *skb = tx_buff->skb; > unsigned int info = le32_to_cpu(txbd->info); > > if ((info & FOR_EMAC) || !txbd->data || !skb) > break; > ^^^^^ > > -> the "break" statement prevents reading all txbds. At most one extra > descriptor is read and this driver isn't in the Mpps business. > You are right, I forgot the break statement. > > I tried your advice, Tx throughput can only reach 5.52MB/s. > > Even with the original code above ? Yes, I left tx_clean unmodified, and took your advice below. I tested it again just now, this time throughput do reach 9.8MB/s, Maybe last time cpu is not idle. I still have a question, is it possible that tx_clean() run between priv->tx_buff[*txbd_curr].skb = skb and dma_wmb()? --- a/drivers/net/ethernet/arc/emac_main.c +++ b/drivers/net/ethernet/arc/emac_main.c @@ -685,13 +685,15 @@ static int arc_emac_tx(struct sk_buff *skb, struct net_device *ndev) wmb(); skb_tx_timestamp(skb); + priv->tx_buff[*txbd_curr].skb = skb; + + dma_wmb(); *info = cpu_to_le32(FOR_EMAC | FIRST_OR_LAST_MASK | len); /* Make sure info word is set */ wmb(); - priv->tx_buff[*txbd_curr].skb = skb; /* Increment index to point to the next BD */ *txbd_curr = (*txbd_curr + 1) % TX_BD_NUM;