pt., 15 maj 2020 o 14:04 Arnd Bergmann <arnd@xxxxxxxx> napisał(a): > > On Fri, May 15, 2020 at 9:11 AM Bartosz Golaszewski <brgl@xxxxxxxx> wrote: > > > > czw., 14 maj 2020 o 18:19 Arnd Bergmann <arnd@xxxxxxxx> napisał(a): > > > > > > On Thu, May 14, 2020 at 10:00 AM Bartosz Golaszewski <brgl@xxxxxxxx> wrote: > > > > +static unsigned int mtk_mac_intr_read_and_clear(struct mtk_mac_priv *priv) > > > > +{ > > > > + unsigned int val; > > > > + > > > > + regmap_read(priv->regs, MTK_MAC_REG_INT_STS, &val); > > > > + regmap_write(priv->regs, MTK_MAC_REG_INT_STS, val); > > > > + > > > > + return val; > > > > +} > > > > > > Do you actually need to read the register? That is usually a relatively > > > expensive operation, so if possible try to use clear the bits when > > > you don't care which bits were set. > > > > > > > I do care, I'm afraid. The returned value is being used in the napi > > poll callback to see which ring to process. > > I suppose the other callers are not performance critical. > > For the rx and tx processing, it should be better to just always look at > the queue directly and ignore the irq status, in particular when you > are already in polling mode: suppose you receive ten frames at once > and only process five but clear the irq flag. > > When the poll function is called again, you still need to process the > others, but I would assume that the status tells you that nothing > new has arrived so you don't process them until the next interrupt. > > For the statistics, I assume you do need to look at the irq status, > but this doesn't have to be done in the poll function. How about > something like: > > - in hardirq context, read the irq status word > - irq rx or tx irq pending, call napi_schedule > - if stats irq pending, schedule a work function > - in napi poll, process both queues until empty or > budget exhausted > - if packet processing completed in poll function > ack the irq and check again, call napi_complete > - in work function, handle stats irq, then ack it > I see your point. I'll try to come up with something and send a new version on Monday. > > > > +static void mtk_mac_tx_complete_all(struct mtk_mac_priv *priv) > > > > +{ > > > > + struct mtk_mac_ring *ring = &priv->tx_ring; > > > > + struct net_device *ndev = priv->ndev; > > > > + int ret; > > > > + > > > > + for (;;) { > > > > + mtk_mac_lock(priv); > > > > + > > > > + if (!mtk_mac_ring_descs_available(ring)) { > > > > + mtk_mac_unlock(priv); > > > > + break; > > > > + } > > > > + > > > > + ret = mtk_mac_tx_complete_one(priv); > > > > + if (ret) { > > > > + mtk_mac_unlock(priv); > > > > + break; > > > > + } > > > > + > > > > + if (netif_queue_stopped(ndev)) > > > > + netif_wake_queue(ndev); > > > > + > > > > + mtk_mac_unlock(priv); > > > > + } > > > > +} > > > > > > It looks like most of the stuff inside of the loop can be pulled out > > > and only done once here. > > > > > > > I did that in one of the previous submissions but it was pointed out > > to me that a parallel TX path may fill up the queue before I wake it. > > Right, I see you plugged that hole, however the way you hold the > spinlock across the expensive DMA management but then give it > up in each loop iteration feels like this is not the most efficient > way. > Maybe my thinking is wrong here, but I assumed that with a spinlock it's better to give other threads the chance to run in between each iteration. I didn't benchmark it though. > The easy way would be to just hold the lock across the entire > loop and then be sure you do it right. Alternatively you could > minimize the locking and only do the wakeup after up do the final > update to the tail pointer, at which point you know the queue is not > full because you have just freed up at least one entry. > Makes sense, I'll see what I can do. Bartosz