On Thu. 30 Mars 2023 at 15:49, Peter Hong <peter_hong@xxxxxxxxxxxxx> wrote: > Hi Vincent, > > Vincent MAILHOL 於 2023/3/28 下午 12:49 寫道: > >>>> +static int f81604_set_reset_mode(struct net_device *netdev) > >>>> +{ > >>>> + struct f81604_port_priv *priv = netdev_priv(netdev); > >>>> + int status, i; > >>>> + u8 tmp; > >>>> + > >>>> + /* disable interrupts */ > >>>> + status = f81604_set_sja1000_register(priv->dev, netdev->dev_id, > >>>> + SJA1000_IER, IRQ_OFF); > >>>> + if (status) > >>>> + return status; > >>>> + > >>>> + for (i = 0; i < F81604_SET_DEVICE_RETRY; i++) { > >>> Thanks for removing F81604_USB_MAX_RETRY. > >>> > >>> Yet, I still would like to understand why you need one hundred tries? > >>> Is this some paranoiac safenet? Or does the device really need so many > >>> attempts to operate reliably? If those are needed, I would like to > >>> understand the root cause. > >> This section is copy from sja1000.c. In my test, the operation/reset may > >> retry 1 times. > >> I'll reduce it from 100 to 10 times. > > Is it because the device is not ready? Does this only appear at > > startup or at random? > > I'm using ip link up/down to test open/close(). It's may not ready so fast. > but the maximum retry is only 1 for test 10000 times. Ack, thanks for the explanation. > >>>> +static int f81604_set_termination(struct net_device *netdev, u16 term) > >>>> +{ > >>>> + struct f81604_port_priv *port_priv = netdev_priv(netdev); > >>>> + struct f81604_priv *priv; > >>>> + u8 mask, data = 0; > >>>> + int r; > >>>> + > >>>> + priv = usb_get_intfdata(port_priv->intf); > >>>> + > >>>> + if (netdev->dev_id == 0) > >>>> + mask = F81604_CAN0_TERM; > >>>> + else > >>>> + mask = F81604_CAN1_TERM; > >>>> + > >>>> + if (term == F81604_TERMINATION_ENABLED) > >>>> + data = mask; > >>>> + > >>>> + mutex_lock(&priv->mutex); > >>> Did you witness a race condition? > >>> > >>> As far as I know, this call back is only called while the network > >>> stack big kernel lock (a.k.a. rtnl_lock) is being hold. > >>> If you have doubt, try adding a: > >>> > >>> ASSERT_RTNL() > >>> > >>> If this assert works, then another mutex is not needed. > >> It had added ASSERT_RTNL() into f81604_set_termination(). It only assert > >> in f81604_probe() -> f81604_set_termination(), not called via ip command: > >> ip link set dev can0 type can termination 120 > >> ip link set dev can0 type can termination 0 > >> > >> so I'll still use mutex on here. > > Sorry, do you mean that the assert throws warnings for f81604_probe() > > -> f81604_set_termination() but that it is OK (no warning) for ip > > command? > > > > I did not see that you called f81604_set_termination() internally. > > Indeed, rtnl_lock is not held in probe(). But I think it is still OK. > > In f81604_probe() you call f81604_set_termination() before > > register_candev(). If the device is not yet registered, > > f81604_set_termination() can not yet be called via ip command. Can you > > describe more precisely where you think there is a concurrency issue? > > I still do not see it. > > Sorry, I had inverse the mean of ASSERT_RTNL(). It like you said. > f81604_probe() not held rtnl_lock. > ip set terminator will held rtnl_lock. > > Due to f81604_set_termination() will called by f81604_probe() to > initialize, it may need mutex in > situation as following: > > User is setting can0 terminator when f81604_probe() complete generate > can0 and generating can1. > So IMO, the mutex may needed. Hmm, I am still not a fan of setting a mutex for a single concurrency issue which can only happen during probing. What about this: static int __f81604_set_termination(struct net_device *netdev, u16 term) { struct f81604_port_priv *port_priv = netdev_priv(netdev); u8 mask, data = 0; if (netdev->dev_id == 0) mask = F81604_CAN0_TERM; else mask = F81604_CAN1_TERM; if (term == F81604_TERMINATION_ENABLED) data = mask; return f81604_mask_set_register(port_priv->dev, F81604_TERMINATOR_REG, mask, data); } static int f81604_set_termination(struct net_device *netdev, u16 term) { ASSERT_RTNL(); return __f81604_set_termination(struct net_device *netdev, u16 term); } static int f81604_init_termination(struct f81604_priv *priv) { int i, ret; for (i = 0; i < ARRAY_SIZE(f81604_priv->netdev); i++) { ret = __f81604_set_termination(f81604_priv->netdev[i], F81604_TERMINATION_DISABLED); if (ret) return ret; } } static int f81604_probe(struct usb_interface *intf, const struct usb_device_id *id) { /* ... */ err = f81604_init_termination(priv); if (err) goto failure_cleanup; for (i = 0; i < ARRAY_SIZE(f81604_priv->netdev); i++) { /* ... */ } /* ... */ } Initialise all resistors with __f81604_set_termination() in probe() before registering any network device. Use f81604_set_termination() which has the lock assert elsewhere. Also, looking at your probe() function, in label clean_candev:, if the second can channel fails its initialization, you do not clean the first can channel. I suggest adding a f81604_init_netdev() and handling the netdev issue and cleanup in that function. > >>>> + port_priv->can.do_get_berr_counter = f81604_get_berr_counter; > >>>> + port_priv->can.ctrlmode_supported = > >>>> + CAN_CTRLMODE_LISTENONLY | CAN_CTRLMODE_3_SAMPLES | > >>>> + CAN_CTRLMODE_ONE_SHOT | CAN_CTRLMODE_BERR_REPORTING | > >>>> + CAN_CTRLMODE_CC_LEN8_DLC | CAN_CTRLMODE_PRESUME_ACK; > >>> Did you test the CAN_CTRLMODE_CC_LEN8_DLC feature? Did you confirm > >>> that you can send and receive DLC greater than 8? > >> Sorry, I had misunderstand the define. This device is only support 0~8 > >> data length, > > ^^^^^^^^^^^ > > > > Data length or Data Length Code (DLC)? Classical CAN maximum data > > length is 8 but maximum DLC is 15 (and DLC 8 to 15 mean a data length > > of 8). > > > > This device can't support DLC > 8. It's only support 0~8. Ack.