On 11/27/23 12:04 PM, Claudiu wrote: > From: Claudiu Beznea <claudiu.beznea.uj@xxxxxxxxxxxxxx> > > On RZ/G3S SMARC Carrier II board having RGMII connections b/w Ethernet > MACs and PHYs it has been discovered that doing unbind/bind for ravb > driver in a loop leads to wrong speed and duplex for Ethernet links and > broken connectivity (the connectivity cannot be restored even with > bringing interface down/up). Before doing unbind/bind the Ethernet > interfaces were configured though systemd. The sh instructions used to > do unbind/bind were: > > $ cd /sys/bus/platform/drivers/ravb/ > $ while :; do echo 11c30000.ethernet > unbind ; \ > echo 11c30000.ethernet > bind; done > > It has been discovered that there is a race b/w IOCTLs initialized by > systemd at the response of success binding and the > "ravb_write(ndev, CCC_OPC_RESET, CCC)" instruction in ravb_remove() as s/instruction/call/, perhaps? > follows: > > 1/ as a result of bind success the user space open/configures the > interfaces tough an IOCTL; the following stack trace has been > identified on RZ/G3S: > > Call trace: > dump_backtrace+0x9c/0x100 > show_stack+0x20/0x38 > dump_stack_lvl+0x48/0x60 > dump_stack+0x18/0x28 > ravb_open+0x70/0xa58 > __dev_open+0xf4/0x1e8 > __dev_change_flags+0x198/0x218 > dev_change_flags+0x2c/0x80 > devinet_ioctl+0x640/0x708 > inet_ioctl+0x1e4/0x200 > sock_do_ioctl+0x50/0x108 > sock_ioctl+0x240/0x358 > __arm64_sys_ioctl+0xb0/0x100 > invoke_syscall+0x50/0x128 > el0_svc_common.constprop.0+0xc8/0xf0 > do_el0_svc+0x24/0x38 > el0_svc+0x34/0xb8 > el0t_64_sync_handler+0xc0/0xc8 > el0t_64_sync+0x190/0x198 > > 2/ this call may execute concurrently with ravb_remove() as the > unbind/bind operation was executed in a loop > 3/ if the operation mode is changed to RESET (though Through? > ravb_write(ndev, CCC_OPC_RESET, CCC) instruction in ravb_remove()) s/instruction/call/, perhaps? > while the above ravb_open() is in progress it may lead to MAC > (or PHY, or MAC-PHY connection, the right point hasn't been identified > at the moment) to be broken, thus the Ethernet connectivity fails to > restore. > > The simple fix for this is to move ravb_write(ndev, CCC_OPC_RESET, CCC)) > after unregister_netdev() to avoid resetting the controller while the > netdev interface is still registered. > > To avoid future issues in ravb_remove(), the patch follows the proper order > of operations in ravb_remove(): reverse order compared with ravb_probe(). > This avoids described races as the IOCTLs as well as unregister_netdev() > (called now at the beginning of ravb_remove()) calls rtnl_lock() before > continuing and IOCTLs check (though devinet_ioctl()) if device is still > registered just after taking the lock: > > int devinet_ioctl(struct net *net, unsigned int cmd, struct ifreq *ifr) > { > // ... > > rtnl_lock(); > > ret = -ENODEV; > dev = __dev_get_by_name(net, ifr->ifr_name); > if (!dev) > goto done; > > // ... > done: > rtnl_unlock(); > out: > return ret; > } > > Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") > Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@xxxxxxxxxxxxxx> Reviewed-by: Sergey Shtylyov <s.shtylyov@xxxxxx> [...] Sorry for overlooking this race (and other bugs) when prepping the driver for upstream! MBR, Sergey