Lino Sanfilippo <LinoSanfilippo at gmx.de> : [...] > what about the (only compile tested) code below? I may have misunderstood some parts but it nonetheless seems broken. > The smp_wmb() in tx function combined with the smp_rmb() in tx_clean ensures > that the CPU running tx_clean sees consistent values for info, data and skb > (thus no need to check for validity of all three values any more). > The mb() fulfills several tasks: > 1. makes sure that DMA writes to descriptor are completed before the HW is > informed. "DMA writes" == "CPU writes" ? > 2. On multi processor systems: ensures that txbd_curr is updated (this is paired > with the smp_mb() at the end of tx_clean). Smells like using barrier side-effects to control smp coherency. It isn't the recommended style. > 3. Ensure we see the most recent value for tx_dirty. With this we do not have to > recheck after we stopped the tx queue. > > > --- a/drivers/net/ethernet/arc/emac_main.c > +++ b/drivers/net/ethernet/arc/emac_main.c > @@ -162,8 +162,13 @@ static void arc_emac_tx_clean(struct net_device *ndev) > struct sk_buff *skb = tx_buff->skb; > unsigned int info = le32_to_cpu(txbd->info); > > - if ((info & FOR_EMAC) || !txbd->data || !skb) > + if (info & FOR_EMAC) { > + /* Make sure we see consistent values for info, skb > + * and data. > + */ > + smp_rmb(); > break; > + } ? smp_rmb should appear before the variables you want coherency for. > > if (unlikely(info & (DROP | DEFR | LTCL | UFLO))) { > stats->tx_errors++; > @@ -679,36 +684,33 @@ static int arc_emac_tx(struct sk_buff *skb, struct net_device *ndev) > dma_unmap_addr_set(&priv->tx_buff[*txbd_curr], addr, addr); > dma_unmap_len_set(&priv->tx_buff[*txbd_curr], len, len); > > - priv->txbd[*txbd_curr].data = cpu_to_le32(addr); > > - /* Make sure pointer to data buffer is set */ > - wmb(); > + priv->txbd[*txbd_curr].data = cpu_to_le32(addr); > + priv->tx_buff[*txbd_curr].skb = skb; > > - skb_tx_timestamp(skb); > + /* Make sure info is set after data and skb with respect to > + * other tx_clean(). > + */ > + smp_wmb(); > > *info = cpu_to_le32(FOR_EMAC | FIRST_OR_LAST_MASK | len); Afaik smp_wmb() does not imply wmb(). So priv->txbd[*txbd_curr].data and *info (aka priv->txbd[*txbd_curr].info) are not necessarily written in an orderly manner. > > - /* Make sure info word is set */ > - wmb(); > - > - priv->tx_buff[*txbd_curr].skb = skb; > - > /* Increment index to point to the next BD */ > *txbd_curr = (*txbd_curr + 1) % TX_BD_NUM; With this change it's possible that tx_clean() reads new value for tx_curr and old value (0) for *info. > > - /* Ensure that tx_clean() sees the new txbd_curr before > + /* 1.Ensure that tx_clean() sees the new txbd_curr before > * checking the queue status. This prevents an unneeded wake > * of the queue in tx_clean(). > + * 2.Ensure that all values are written to RAM and to DMA > + * before hardware is informed. (I am not sure what "DMA" is supposed to mean here.) > + * 3.Ensure we see the most recent value for tx_dirty. > */ > - smp_mb(); > + mb(); > > - if (!arc_emac_tx_avail(priv)) { > + if (!arc_emac_tx_avail(priv)) > netif_stop_queue(ndev); > - /* Refresh tx_dirty */ > - smp_mb(); > - if (arc_emac_tx_avail(priv)) > - netif_start_queue(ndev); > - } Xmit thread | Clean thread mb(); arc_emac_tx_avail() test with old tx_dirty - tx_clean has not issued any mb yet - and new tx_curr smp_mb(); if (netif_queue_stopped(ndev) && ... netif_wake_queue(ndev); netif_stop_queue() -> queue stopped. You can't remove the revalidation step. arc_emac_tx_avail() is essentially pessimistic. Even if arc_emac_tx_avail() was "right", there would be a tx_clean window between arc_emac_tx_avail() and netif_stop_queue(). > + > + skb_tx_timestamp(skb); You don't want to issue skb_tx_timestamp after releasing control of the descriptor (*info = ...): skb may be long gone. -- Ueimor