On 11/15/23 5:26 AM, Yoshihiro Shimoda wrote: > Fix races between ravb_tx_timeout_work() and functions of net_device_ops > and ethtool_ops by using rtnl_trylock() and rtnl_unlock(). Note that > since ravb_close() is under the rtnl lock and calls cancel_work_sync(), > ravb_tx_timeout_work() should calls rtnl_trylock(). Otherwise, a deadlock > may happen in ravb_tx_timeout_work() like below: > > CPU0 CPU1 > ravb_tx_timeout() > schedule_work() > ... > __dev_close_many() > // Under rtnl lock > ravb_close() > cancel_work_sync() > // Waiting > ravb_tx_timeout_work() > rtnl_lock() > // This is possible to cause a deadlock > > And, if rtnl_trylock() fails and the netif is still running, > rescheduling the work with 1 msec delayed. So, using > schedule_delayed_work() instead of schedule_work(). > > Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") > Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@xxxxxxxxxxx> > Reviewed-by: Sergey Shtylyov <s.shtylyov@xxxxxx> Hm, I haven't reviewed this version... :-) > Reviewed-by: Simon Horman <horms@xxxxxxxxxx> [...] > diff --git a/drivers/net/ethernet/renesas/ravb.h b/drivers/net/ethernet/renesas/ravb.h > index e0f8276cffed..e9bb8ee3ba2d 100644 > --- a/drivers/net/ethernet/renesas/ravb.h > +++ b/drivers/net/ethernet/renesas/ravb.h > @@ -1081,7 +1081,7 @@ struct ravb_private { > u32 cur_tx[NUM_TX_QUEUE]; > u32 dirty_tx[NUM_TX_QUEUE]; > struct napi_struct napi[NUM_RX_QUEUE]; > - struct work_struct work; > + struct delayed_work work; Not sure this is justified... Then again, what do I know about workqueues? Not much... :-) [...] > diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c > index c70cff80cc99..ca7db8a5b412 100644 > --- a/drivers/net/ethernet/renesas/ravb_main.c > +++ b/drivers/net/ethernet/renesas/ravb_main.c > @@ -1863,17 +1863,24 @@ static void ravb_tx_timeout(struct net_device *ndev, unsigned int txqueue) > /* tx_errors count up */ > ndev->stats.tx_errors++; > > - schedule_work(&priv->work); > + schedule_delayed_work(&priv->work, 0); > } > > static void ravb_tx_timeout_work(struct work_struct *work) > { > - struct ravb_private *priv = container_of(work, struct ravb_private, > + struct delayed_work *dwork = to_delayed_work(work); > + struct ravb_private *priv = container_of(dwork, struct ravb_private, > work); > const struct ravb_hw_info *info = priv->info; > struct net_device *ndev = priv->ndev; > int error; > > + if (!rtnl_trylock()) { > + if (netif_running(ndev)) > + schedule_delayed_work(&priv->work, msecs_to_jiffies(10)); The delay is rather arbitrary. Why not e.g. 1 ms? [...] MBR, Sergey