On Sat, May 27, 2023 at 07:57:04AM +0200, Oleksij Rempel wrote: > Hi Fedor, > > On Fri, May 26, 2023 at 09:50:26PM +0300, Fedor Pchelkin wrote: > > Hi Oleksij, > > > > thanks for the reply! > > > > On Fri, May 26, 2023 at 08:15:00PM +0200, Oleksij Rempel wrote: > > > Hi Fedor, > > > > > > On Fri, May 26, 2023 at 08:19:10PM +0300, Fedor Pchelkin wrote: > > > > > > > > > Thank you for your investigation. How about this change? > > > --- a/net/can/j1939/main.c > > > +++ b/net/can/j1939/main.c > > > @@ -285,8 +285,7 @@ struct j1939_priv *j1939_netdev_start(struct net_device *ndev) > > > */ > > > kref_get(&priv_new->rx_kref); > > > spin_unlock(&j1939_netdev_lock); > > > - dev_put(ndev); > > > - kfree(priv); > > > + j1939_priv_put(priv); > > > > I don't think that's good because the priv which is directly freed here is > > still local to the thread, and parallel threads don't have any access to > > it. j1939_priv_create() has allocated a fresh priv and called dev_hold() > > so dev_put() and kfree() here are okay. > > > > > return priv_new; > > > } > > > j1939_priv_set(ndev, priv); > > > @@ -300,8 +299,7 @@ struct j1939_priv *j1939_netdev_start(struct net_device *ndev) > > > > > > out_priv_put: > > > j1939_priv_set(ndev, NULL); > > > - dev_put(ndev); > > > - kfree(priv); > > > + j1939_priv_put(priv); > > > > > > return ERR_PTR(ret); > > > } > > > > > > If I see it correctly, the problem is kfree() which is called without respecting > > > the ref counting. If CPU1 has priv_new, refcounting is increased. The priv will > > > not be freed on this place. > > > > With your suggestion, I think it doesn't work correctly if > > j1939_can_rx_register() fails and we go to out_priv_put. The priv is kept > > but the parallel thread which may have already grabbed it thinks that > > j1939_can_rx_register() has succeeded when actually it hasn't succeed. > > Moreover, j1939_priv_set() makes it NULL on error path so that priv cannot > > be accessed from ndev. > > > > I also considered the alternatives where we don't have to serialize access > > to j1939_can_rx_register() and subsequently introduce mutex. But with > > current j1939_netdev_start() implementation I can't see how to fix the > > racy bug without it. > > Ok, it make sense. > > I'll try to do some testing next week. If i'll forget it, please feel > free to ping me. Got it, thank you. > > Regards, > Oleksij > -- > Pengutronix e.K. | | > Steuerwalder Str. 21 | http://www.pengutronix.de/ | > 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | > Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |