pt., 15 maj 2020 o 15:32 Arnd Bergmann <arnd@xxxxxxxx> napisał(a): > > On Thu, May 14, 2020 at 10:00 AM Bartosz Golaszewski <brgl@xxxxxxxx> wrote: > > +static int mtk_mac_ring_pop_tail(struct mtk_mac_ring *ring, > > + struct mtk_mac_ring_desc_data *desc_data) > > I took another look at this function because of your comment on the locking > the descriptor updates, which seemed suspicious as the device side does not > actually use the locks to access them > > > +{ > > + struct mtk_mac_ring_desc *desc = &ring->descs[ring->tail]; > > + unsigned int status; > > + > > + /* Let the device release the descriptor. */ > > + dma_rmb(); > > + status = desc->status; > > + if (!(status & MTK_MAC_DESC_BIT_COWN)) > > + return -1; > > The dma_rmb() seems odd here, as I don't see which prior read > is being protected by this. > > > + desc_data->len = status & MTK_MAC_DESC_MSK_LEN; > > + desc_data->flags = status & ~MTK_MAC_DESC_MSK_LEN; > > + desc_data->dma_addr = ring->dma_addrs[ring->tail]; > > + desc_data->skb = ring->skbs[ring->tail]; > > + > > + desc->data_ptr = 0; > > + desc->status = MTK_MAC_DESC_BIT_COWN; > > + if (status & MTK_MAC_DESC_BIT_EOR) > > + desc->status |= MTK_MAC_DESC_BIT_EOR; > > + > > + /* Flush writes to descriptor memory. */ > > + dma_wmb(); > > The comment and the barrier here seem odd as well. I would have expected > a barrier after the update to the data pointer, and only a single store > but no read of the status flag instead of the read-modify-write, > something like > > desc->data_ptr = 0; > dma_wmb(); /* make pointer update visible before status update */ > desc->status = MTK_MAC_DESC_BIT_COWN | (status & MTK_MAC_DESC_BIT_EOR); > > > + ring->tail = (ring->tail + 1) % MTK_MAC_RING_NUM_DESCS; > > + ring->count--; > > I would get rid of the 'count' here, as it duplicates the information > that is already known from the difference between head and tail, and you > can't update it atomically without holding a lock around the access to > the ring. The way I'd do this is to have the head and tail pointers > in separate cache lines, and then use READ_ONCE/WRITE_ONCE > and smp barriers to access them, with each one updated on one > thread but read by the other. > Your previous solution seems much more reliable though. For instance in the above: when we're doing the TX cleanup (we got the TX ready irq, we're iterating over descriptors until we know there are no more packets scheduled (count == 0) or we encounter one that's still owned by DMA), a parallel TX path can schedule new packets to be sent and I don't see how we can atomically check the count (understood as a difference between tail and head) and run a new iteration (where we'd modify the head or tail) without risking the other path getting in the way. We'd have to always check the descriptor. I experimented a bit with this and couldn't come up with anything that would pass any stress test. On the other hand: spin_lock_bh() works fine and I like your approach from the previous e-mail - except for the work for updating stats as we could potentially lose some stats when we're updating in process context with RX/TX paths running in parallel in napi context but that would be rare enough to overlook it. I hope v4 will be good enough even with spinlocks. :) Bart