Hello, > From: Sergey Shtylyov, Sent: Thursday, November 16, 2023 3:37 AM > > On 11/15/23 5:26 AM, Yoshihiro Shimoda wrote: > > > Fix races between ravb_tx_timeout_work() and functions of net_device_ops > > and ethtool_ops by using rtnl_trylock() and rtnl_unlock(). Note that > > since ravb_close() is under the rtnl lock and calls cancel_work_sync(), > > ravb_tx_timeout_work() should calls rtnl_trylock(). Otherwise, a deadlock > > may happen in ravb_tx_timeout_work() like below: > > > > CPU0 CPU1 > > ravb_tx_timeout() > > schedule_work() > > ... > > __dev_close_many() > > // Under rtnl lock > > ravb_close() > > cancel_work_sync() > > // Waiting > > ravb_tx_timeout_work() > > rtnl_lock() > > // This is possible to cause a deadlock > > > > And, if rtnl_trylock() fails and the netif is still running, > > rescheduling the work with 1 msec delayed. So, using > > schedule_delayed_work() instead of schedule_work(). > > > > Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") > > Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@xxxxxxxxxxx> > > Reviewed-by: Sergey Shtylyov <s.shtylyov@xxxxxx> > > Hm, I haven't reviewed this version... :-) Oops, I should have dropped the tag... > > Reviewed-by: Simon Horman <horms@xxxxxxxxxx> > [...] > > diff --git a/drivers/net/ethernet/renesas/ravb.h b/drivers/net/ethernet/renesas/ravb.h > > index e0f8276cffed..e9bb8ee3ba2d 100644 > > --- a/drivers/net/ethernet/renesas/ravb.h > > +++ b/drivers/net/ethernet/renesas/ravb.h > > @@ -1081,7 +1081,7 @@ struct ravb_private { > > u32 cur_tx[NUM_TX_QUEUE]; > > u32 dirty_tx[NUM_TX_QUEUE]; > > struct napi_struct napi[NUM_RX_QUEUE]; > > - struct work_struct work; > > + struct delayed_work work; > > Not sure this is justified... > Then again, what do I know about workqueues? Not much... :-) I thought that the schedule_work() called the work function immediately. So, I thought call*ing the schedule_work() from the work function caused endless loop. However, it is not true. The schedule_work() just inserts a work queue, and then the kernel calls the work function later. So, changing from work_struct to delayed_work is not needed for fixing this issue, I think now. However, I have another concern about rescheduling this work by schedule_work() here because it's possible to cause high CPU load while the rtnl_lock() is held. So, I think we should call a sleep function like usleep_range(1000, 2000) for instance before schedule_work(). But, what do you think? > [...] > > diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c > > index c70cff80cc99..ca7db8a5b412 100644 > > --- a/drivers/net/ethernet/renesas/ravb_main.c > > +++ b/drivers/net/ethernet/renesas/ravb_main.c > > @@ -1863,17 +1863,24 @@ static void ravb_tx_timeout(struct net_device *ndev, unsigned int txqueue) > > /* tx_errors count up */ > > ndev->stats.tx_errors++; > > > > - schedule_work(&priv->work); > > + schedule_delayed_work(&priv->work, 0); > > } > > > > static void ravb_tx_timeout_work(struct work_struct *work) > > { > > - struct ravb_private *priv = container_of(work, struct ravb_private, > > + struct delayed_work *dwork = to_delayed_work(work); > > + struct ravb_private *priv = container_of(dwork, struct ravb_private, > > work); > > const struct ravb_hw_info *info = priv->info; > > struct net_device *ndev = priv->ndev; > > int error; > > > > + if (!rtnl_trylock()) { > > + if (netif_running(ndev)) > > + schedule_delayed_work(&priv->work, msecs_to_jiffies(10)); > > The delay is rather arbitrary. Why not e.g. 1 ms? I think that 1 ms is enough. Best regards, Yoshihiro Shimoda > [...] > > MBR, Sergey