On Thu, Dec 14, 2023 at 07:11:12PM +0100, Ronald Wahl wrote: > From: Ronald Wahl <ronald.wahl@xxxxxxxxxxx> > > There is a bug in the ks8851 Ethernet driver that more data is written > to the hardware TX buffer than actually available. This is caused by > wrong accounting of the free TX buffer space. > > The driver maintains a tx_space variable that represents the TX buffer > space that is deemed to be free. The ks8851_start_xmit_spi() function > adds an SKB to a queue if tx_space is large enough and reduces tx_space > by the amount of buffer space it will later need in the TX buffer and > then schedules a work item. If there is not enough space then the TX > queue is stopped. > > The worker function ks8851_tx_work() dequeues all the SKBs and writes > the data into the hardware TX buffer. The last packet will trigger an > interrupt after it was send. Here it is assumed that all data fits into > the TX buffer. > > In the interrupt routine (which runs asynchronously because it is a > threaded interrupt) tx_space is updated with the current value from the > hardware. Also the TX queue is woken up again. > > Now it could happen that after data was sent to the hardware and before > handling the TX interrupt new data is queued in ks8851_start_xmit_spi() > when the TX buffer space had still some space left. When the interrupt > is actually handled tx_space is updated from the hardware but now we > already have new SKBs queued that have not been written to the hardware > TX buffer yet. Since tx_space has been overwritten by the value from the > hardware the space is not accounted for. > > Now we have more data queued then buffer space available in the hardware > and ks8851_tx_work() will potentially overrun the hardware TX buffer. In > many cases it will still work because often the buffer is written out > fast enough so that no overrun occurs but for example if the peer > throttles us via flow control then an overrun may happen. > > This can be fixed in different ways. The most simple way would be to set > tx_space to 0 before writing data to the hardware TX buffer preventing > the queuing of more SKBs until the TX interrupt has been handled. I have > chosen a slightly more efficient (and still rather simple) way and > track the amount of data that is already queued and not yet written to > the hardware. When new SKBs are to be queued the already queued amount > of data is honoured when checking free TX buffer space. > > I tested this with a setup of two linked KS8851 running iperf3 between > the two in bidirectional mode. Before the fix I got a stall after some > minutes. With the fix I saw now issues anymore after hours. > > Fixes: 3ba81f3ece3c ("net: Micrel KS8851 SPI network driver") > Cc: "David S. Miller" <davem@xxxxxxxxxxxxx> > Cc: Eric Dumazet <edumazet@xxxxxxxxxx> > Cc: Jakub Kicinski <kuba@xxxxxxxxxx> > Cc: Paolo Abeni <pabeni@xxxxxxxxxx> > Cc: Ben Dooks <ben.dooks@xxxxxxxxxxxxxxx> > Cc: Tristram Ha <Tristram.Ha@xxxxxxxxxxxxx> > Cc: netdev@xxxxxxxxxxxxxxx > Cc: stable@xxxxxxxxxxxxxxx # 5.10+ > Signed-off-by: Ronald Wahl <ronald.wahl@xxxxxxxxxxx> > --- > V3: - Add missing kdoc of structure fields > - Avoid potential NULL pointer dereference > - Fix stack variable declaration order > > V2: - Added Fixes: tag (issue actually present from the beginning) > - cosmetics reported by checkpatch Thanks for the updates. This change looks good to me, and I agree that the problem was introduced in the cited commit. Reviewed-by: Simon Horman <horms@xxxxxxxxxx>