> The RXBOE is basically a warning, indicating that for some reason the SPI host is not fast enough in retrieving frames from the device. > In my experience, this is typically caused by excessive latency of the SPI driver, or if you have an unoptimized network driver for the MACPHY. > > Thanks, > Piergiorgio > > > > [ 285.482275] LAN865X Rev.B0 Internal Phy spi0.0:00: attached PHY > > driver (mii_bus:phy_addr=spi0.0:00, irq=POLL) [ 285.534760] lan865x > > spi0.0 eth1: Link is Up - 10Mbps/Half - flow control off [ > > 341.466221] eth1: Receive buffer overflow error [ 345.730222] eth1: > > Receive buffer overflow error [ 345.891126] eth1: Receive buffer > > overflow error [ 346.074220] eth1: Receive buffer overflow error > > Generally we only log real errors. Is a receive buffer overflow a real error? I would say not. But it would be good to count it. > Andrew Hi, I've been busy throwing stuff at the wall until something sticks. I've managed to narrow a few things down. First and foremost, when running a periodic udp6 multicast in the background I don't get a hang in the driver, or it becomes considerably harder to provoke. When I make sure that the bespoke Ferroamp upd server is not started (which just joins a mcast group and sends a less than MTU packet every ~500ms and listens for incoming multicast messages in the same group), it becomes very simple to get to a live-lock. My steasp of reproducing is setting a ipv4 address on both ends of the link, then running the follwing script on both ends using the other ends ip as argument. #!/bin/env python3 import socket import sys if __name__ == '__main__': sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) while True: sock.sendto(b'0'*2048, (sys.argv[1], 4040)) Neither ends opens a listening socket. I get to the live lock in 10s or less. I've enabled some debugging options but so far nothing seems to hit. What I've been able to conclude is that there still is SPI communication, the macphy interrupt is still pulled low, and the cpu does the ack so that it's reset to inactive. But from there it seems no data is passed up the network stack. Some symptoms are * net_device stats are no longer incremented * can't ping * can't connect to sockets on the board etc. * cpu usage jumps to and stays at 100% for the worker thread The worker thread is released by the irq handler and it does some of the expected work, but not all. I'm adding some instrumentation to the code in an effort to figure out where things break apart. It might be possible to catch it in gdb as well, but I think you only get one try as the timing will be pretty borked after the first break. R