Re: [PATCH v4 06/11] net: ethernet: mtk-eth-mac: new driver

Arnd Bergmann <arnd@xxxxxxxx> · Wed, 20 May 2020 16:37:23 +0200

On Wed, May 20, 2020 at 1:25 PM Bartosz Golaszewski <brgl@xxxxxxxx> wrote:
>
> From: Bartosz Golaszewski <bgolaszewski@xxxxxxxxxxxx>
>
> This adds the driver for the MediaTek Ethernet MAC used on the MT8* SoC
> family. For now we only support full-duplex.
>
> Signed-off-by: Bartosz Golaszewski <bgolaszewski@xxxxxxxxxxxx>

Looks much better, thanks for addressing my feedback. A few more things
about this version:

> ---
>  drivers/net/ethernet/mediatek/Kconfig       |    6 +
>  drivers/net/ethernet/mediatek/Makefile      |    1 +
>  drivers/net/ethernet/mediatek/mtk_eth_mac.c | 1668 +++++++++++++++++++
>  3 files changed, 1675 insertions(+)
>  create mode 100644 drivers/net/ethernet/mediatek/mtk_eth_mac.c
>
> diff --git a/drivers/net/ethernet/mediatek/Kconfig b/drivers/net/ethernet/mediatek/Kconfig
> index 5079b8090f16..5c3793076765 100644
> --- a/drivers/net/ethernet/mediatek/Kconfig
> +++ b/drivers/net/ethernet/mediatek/Kconfig
> @@ -14,4 +14,10 @@ config NET_MEDIATEK_SOC
>           This driver supports the gigabit ethernet MACs in the
>           MediaTek SoC family.
>
> +config NET_MEDIATEK_MAC
> +       tristate "MediaTek Ethernet MAC support"
> +       select PHYLIB
> +       help
> +         This driver supports the ethernet IP on MediaTek MT85** SoCs.

I just noticed how the naming of NET_MEDIATEK_MAC and NET_MEDIATEK_SOC
for two different drivers doing the same thing is really confusing.

Maybe someone can come up with a better name, such as one
based on the soc it first showed up in.

> +       struct mtk_mac_ring_desc *desc = &ring->descs[ring->head];
> +       unsigned int status;
> +
> +       status = desc->status;
> +
> +       ring->skbs[ring->head] = desc_data->skb;
> +       ring->dma_addrs[ring->head] = desc_data->dma_addr;
> +       desc->data_ptr = desc_data->dma_addr;
> +
> +       status |= desc_data->len;
> +       if (flags)
> +               status |= flags;
> +       desc->status = status;
> +
> +       /* Flush previous modifications before ownership change. */
> +       dma_wmb();
> +       desc->status &= ~MTK_MAC_DESC_BIT_COWN;

You still do the read-modify-write on the word here, which is
expensive on uncached memory. You have read the value already,
so better use an assignment rather than &=, or (better)
READ_ONCE() and WRITE_ONCE() to prevent the compiler
from adding further accesses.

> +static void mtk_mac_lock(struct mtk_mac_priv *priv)
> +{
> +       spin_lock_bh(&priv->lock);
> +}
> +
> +static void mtk_mac_unlock(struct mtk_mac_priv *priv)
> +{
> +       spin_unlock_bh(&priv->lock);
> +}

I think open-coding the locks would make this more readable,
and let you use spin_lock() instead of spin_lock_bh() in
those functions that are already in softirq context.

> +static void mtk_mac_intr_enable_tx(struct mtk_mac_priv *priv)
> +{
> +       regmap_update_bits(priv->regs, MTK_MAC_REG_INT_MASK,
> +                          MTK_MAC_BIT_INT_STS_TNTC, 0);
> +}
> +static void mtk_mac_intr_enable_rx(struct mtk_mac_priv *priv)
> +{
> +       regmap_update_bits(priv->regs, MTK_MAC_REG_INT_MASK,
> +                          MTK_MAC_BIT_INT_STS_FNRC, 0);
> +}

These imply reading the irq mask register and then writing it again,
which is much more expensive than just writing it. It's also not
atomic since the regmap does not use a lock.

I don't think you actually need to enable/disable rx and tx separately,
but if you do, then writing to the Ack register as I suggested instead
of updating the mask would let you do this.

> +/* All processing for TX and RX happens in the napi poll callback. */
> +static irqreturn_t mtk_mac_handle_irq(int irq, void *data)
> +{
> +       struct mtk_mac_priv *priv;
> +       struct net_device *ndev;
> +       bool need_napi = false;
> +       unsigned int status;
> +
> +       ndev = data;
> +       priv = netdev_priv(ndev);
> +
> +       if (netif_running(ndev)) {
> +               status = mtk_mac_intr_read(priv);
> +
> +               if (status & MTK_MAC_BIT_INT_STS_TNTC) {
> +                       mtk_mac_intr_disable_tx(priv);
> +                       need_napi = true;
> +               }
> +
> +               if (status & MTK_MAC_BIT_INT_STS_FNRC) {
> +                       mtk_mac_intr_disable_rx(priv);
> +                       need_napi = true;
> +               }

I think you mixed up the rx and tx bits here: when you get
an rx interrupt, that one is already blocked until it gets
acked and you just need to disable tx until the end of the
poll function.

However, I suspect that the overhead of turning them off
is higher than what  you can save, and simply ignoring
the mask with

if (status & (MTK_MAC_BIT_INT_STS_FNRC | MTK_MAC_BIT_INT_STS_TNTC))
        napi_schedule(&priv->napi);

would be simpler and faster.

 +               /* One of the counters reached 0x8000000 - update stats and
> +                * reset all counters.
> +                */
> +               if (unlikely(status & MTK_MAC_REG_INT_STS_MIB_CNT_TH)) {
> +                       mtk_mac_intr_disable_stats(priv);
> +                       schedule_work(&priv->stats_work);
> +               }
> + befor
> +               mtk_mac_intr_ack_all(priv);

The ack here needs to be dropped, otherwise you can get further
interrupts before the bottom half has had a chance to run.

You might be lucky because you had already disabled the individual
bits earlier, but I don't think that was intentional here.

> +static int mtk_mac_netdev_start_xmit(struct sk_buff *skb,
> +                                    struct net_device *ndev)
> +{
> +       struct mtk_mac_priv *priv = netdev_priv(ndev);
> +       struct mtk_mac_ring *ring = &priv->tx_ring;
> +       struct device *dev = mtk_mac_get_dev(priv);
> +       struct mtk_mac_ring_desc_data desc_data;
> +
> +       desc_data.dma_addr = mtk_mac_dma_map_tx(priv, skb);
> +       if (dma_mapping_error(dev, desc_data.dma_addr))
> +               goto err_drop_packet;
> +
> +       desc_data.skb = skb;
> +       desc_data.len = skb->len;
> +
> +       mtk_mac_lock(priv);
> +
> +       mtk_mac_ring_push_head_tx(ring, &desc_data);
> +
> +       netdev_sent_queue(ndev, skb->len);
> +
> +       if (mtk_mac_ring_full(ring))
> +               netif_stop_queue(ndev);
> +
> +       mtk_mac_unlock(priv);
> +
> +       mtk_mac_dma_resume_tx(priv);

mtk_mac_dma_resume_tx() is an expensive read-modify-write
on an mmio register, so it would make sense to defer it based
on netdev_xmit_more(). (I had missed this in the previous
review)

> +static void mtk_mac_tx_complete_all(struct mtk_mac_priv *priv)
> +{
> +       struct mtk_mac_ring *ring = &priv->tx_ring;
> +       struct net_device *ndev = priv->ndev;
> +       int ret, pkts_compl, bytes_compl;
> +       bool wake = false;
> +
> +       mtk_mac_lock(priv);
> +
> +       for (pkts_compl = 0, bytes_compl = 0;;
> +            pkts_compl++, bytes_compl += ret, wake = true) {
> +               if (!mtk_mac_ring_descs_available(ring))
> +                       break;
> +
> +               ret = mtk_mac_tx_complete_one(priv);
> +               if (ret < 0)
> +                       break;
> +       }
> +
> +       netdev_completed_queue(ndev, pkts_compl, bytes_compl);
> +
> +       if (wake && netif_queue_stopped(ndev))
> +               netif_wake_queue(ndev);
> +
> +       mtk_mac_intr_enable_tx(priv);

No need to ack the interrupt here if napi is still active. Just
ack both rx and tx when calling napi_complete().

Some drivers actually use the napi budget for both rx and tx:
if you have more than 'budget' completed tx frames, return
early from this function and skip the napi_complete even
when less than 'budget' rx frames have arrived.

This way you get more fairness between devices and
can run for longer with irqs disabled as long as either rx
or tx is busy.

         Arnd