> It could be the performance increase comes from two places: > > 1) Spending time and bus bandwidth dealing with the buffer overflow > interrupt > > 2) Printing out the serial port. > > Please could you benchmark a few things: > > 1) Remove the printk("Receive buffer overflow error"), but otherwise > keep the code the same. That will give us an idea how much the serial > port matters. > > 2) Disable only the RX buffer overflow interrupt > > 3) Disable all error interrupts. > Test setup - Server side - PC with a lan8670 usb eval board running a http server that serves a 10MB binary blob. The http server is just python3 -m http.server - Client side - iMX8mn board (quad core arm64) with lan8650 mac-phy (25MHz spi) running curl to download the blob from the server and writing to a ramfs (ddr4 1.xGHz, so should not be a bottleneck) Below are a collection of samples of different test runs. All of the test runs have run a minor patch for hw reset (else nothing works for me). Using curl is not the most scientific approach here, but iperf has not exposed any problems for me earlier with overflows. So sticking with curl since it's easy and definetly gets the overflows. n | name | min | avg | max | rx dropped | samples 1 | no mod | 827K | 846K | 891K | 945 | 5 2 | no log | 711K | 726K | 744K | 562 | 5 3 | less irq | 815K | 833K | 846K | N/A | 5 4 | no irq | 914K | 924K | 931K | N/A | 5 5 | simple | 857K | 868K | 879K | 615 | 5 Description of each scenario 1 - no mod So this just runs a hw reset to get the chip working (described in earlier posts) 2 - no log This scenario just removes the logging when handling the irq state --- a/drivers/net/ethernet/oa_tc6.c +++ b/drivers/net/ethernet/oa_tc6.c @@ -688,8 +688,6 @@ static int oa_tc6_process_extended_status(struct oa_tc6 *tc6) if (FIELD_GET(STATUS0_RX_BUFFER_OVERFLOW_ERROR, value)) { tc6->rx_buf_overflow = true; oa_tc6_cleanup_ongoing_rx_skb(tc6); - net_err_ratelimited("%s: Receive buffer overflow error\n", - tc6->netdev->name); return -EAGAIN; } if (FIELD_GET(STATUS0_TX_PROTOCOL_ERROR, value)) { 3 - less irq This scenario disables the overflow interrupt but keeps the others --- a/drivers/net/ethernet/oa_tc6.c +++ b/drivers/net/ethernet/oa_tc6.c @@ -625,10 +625,10 @@ static int oa_tc6_unmask_macphy_error_interrupts(struct oa_tc6 *tc6) return ret; regval &= ~(INT_MASK0_TX_PROTOCOL_ERR_MASK | - INT_MASK0_RX_BUFFER_OVERFLOW_ERR_MASK | INT_MASK0_LOSS_OF_FRAME_ERR_MASK | INT_MASK0_HEADER_ERR_MASK); + regval |= INT_MASK0_RX_BUFFER_OVERFLOW_ERR_MASK; return oa_tc6_write_register(tc6, OA_TC6_REG_INT_MASK0, regval); } 4 - no irq This scenario disables all imask0 interrupts with the following change diff --git a/drivers/net/ethernet/oa_tc6.c b/drivers/net/ethernet/oa_tc6.c index 9f17f3712137..88a9c6ccb37a 100644 --- a/drivers/net/ethernet/oa_tc6.c +++ b/drivers/net/ethernet/oa_tc6.c @@ -624,12 +624,7 @@ static int oa_tc6_unmask_macphy_error_interrupts(struct oa_tc6 *tc6) if (ret) return ret; - regval &= ~(INT_MASK0_TX_PROTOCOL_ERR_MASK | - INT_MASK0_RX_BUFFER_OVERFLOW_ERR_MASK | - INT_MASK0_LOSS_OF_FRAME_ERR_MASK | - INT_MASK0_HEADER_ERR_MASK); - - return oa_tc6_write_register(tc6, OA_TC6_REG_INT_MASK0, regval); + return oa_tc6_write_register(tc6, OA_TC6_REG_INT_MASK0, (u32)-1); } static int oa_tc6_enable_data_transfer(struct oa_tc6 *tc6) 5 - simple This keeps the interrupt but does not log or drop the socket buffer on irq Moving the rx dropped increment here makes it more of a irq counter I guess, so maybe not relevant. diff --git a/drivers/net/ethernet/oa_tc6.c b/drivers/net/ethernet/oa_tc6.c index 9f17f3712137..cbc20a725ad0 100644 --- a/drivers/net/ethernet/oa_tc6.c +++ b/drivers/net/ethernet/oa_tc6.c @@ -687,10 +687,7 @@ static int oa_tc6_process_extended_status(struct oa_tc6 *tc6) if (FIELD_GET(STATUS0_RX_BUFFER_OVERFLOW_ERROR, value)) { tc6->rx_buf_overflow = true; - oa_tc6_cleanup_ongoing_rx_skb(tc6); - net_err_ratelimited("%s: Receive buffer overflow error\n", - tc6->netdev->name); - return -EAGAIN; + tc6->netdev->stats.rx_dropped++; } if (FIELD_GET(STATUS0_TX_PROTOCOL_ERROR, value)) { netdev_err(tc6->netdev, "Transmit protocol error\n"); - postamble - Removing the logging made things considerably slower which probably indicates that there is a timing dependent behaviour in the driver. I have a hard time explaining why there is a throughput difference between scenario 3 and 4 since I did not get the logs that any of the other interrupts happened. Maybe the irq handling adds some unexpected context switching overhead. My recommendation going forward would be to disable the rx buffer overlow interrupt and removing any code related to handling of it. R