Re: [PATCH V3] can: usb: f81604: add Fintek F81604 support

Vincent MAILHOL <mailhol.vincent@xxxxxxxxxx> · Thu, 30 Mar 2023 22:11:22 +0900

On Thu. 30 Mars 2023 at 15:49, Peter Hong <peter_hong@xxxxxxxxxxxxx> wrote:
> Hi Vincent,
>
> Vincent MAILHOL 於 2023/3/28 下午 12:49 寫道:
> >>>> +static int f81604_set_reset_mode(struct net_device *netdev)
> >>>> +{
> >>>> +       struct f81604_port_priv *priv = netdev_priv(netdev);
> >>>> +       int status, i;
> >>>> +       u8 tmp;
> >>>> +
> >>>> +       /* disable interrupts */
> >>>> +       status = f81604_set_sja1000_register(priv->dev, netdev->dev_id,
> >>>> +                                            SJA1000_IER, IRQ_OFF);
> >>>> +       if (status)
> >>>> +               return status;
> >>>> +
> >>>> +       for (i = 0; i < F81604_SET_DEVICE_RETRY; i++) {
> >>> Thanks for removing F81604_USB_MAX_RETRY.
> >>>
> >>> Yet, I still would like to understand why you need one hundred tries?
> >>> Is this some paranoiac safenet? Or does the device really need so many
> >>> attempts to operate reliably? If those are needed, I would like to
> >>> understand the root cause.
> >> This section is copy from sja1000.c. In my test, the operation/reset may
> >> retry 1 times.
> >> I'll reduce it from 100 to 10 times.
> > Is it because the device is not ready? Does this only appear at
> > startup or at random?
>
> I'm using ip link up/down to test open/close(). It's may not ready so fast.
> but the maximum retry is only 1 for test 10000 times.

Ack, thanks for the explanation.

> >>>> +static int f81604_set_termination(struct net_device *netdev, u16 term)
> >>>> +{
> >>>> +       struct f81604_port_priv *port_priv = netdev_priv(netdev);
> >>>> +       struct f81604_priv *priv;
> >>>> +       u8 mask, data = 0;
> >>>> +       int r;
> >>>> +
> >>>> +       priv = usb_get_intfdata(port_priv->intf);
> >>>> +
> >>>> +       if (netdev->dev_id == 0)
> >>>> +               mask = F81604_CAN0_TERM;
> >>>> +       else
> >>>> +               mask = F81604_CAN1_TERM;
> >>>> +
> >>>> +       if (term == F81604_TERMINATION_ENABLED)
> >>>> +               data = mask;
> >>>> +
> >>>> +       mutex_lock(&priv->mutex);
> >>> Did you witness a race condition?
> >>>
> >>> As far as I know, this call back is only called while the network
> >>> stack big kernel lock (a.k.a. rtnl_lock) is being hold.
> >>> If you have doubt, try adding a:
> >>>
> >>>     ASSERT_RTNL()
> >>>
> >>> If this assert works, then another mutex is not needed.
> >> It had added ASSERT_RTNL() into f81604_set_termination(). It only assert
> >> in f81604_probe() -> f81604_set_termination(), not called via ip command:
> >>       ip link set dev can0 type can termination 120
> >>       ip link set dev can0 type can termination 0
> >>
> >> so I'll still use mutex on here.
> > Sorry, do you mean that the assert throws warnings for f81604_probe()
> > -> f81604_set_termination() but that it is OK (no warning) for ip
> > command?
> >
> > I did not see that you called f81604_set_termination() internally.
> > Indeed, rtnl_lock is not held in probe(). But I think it is still OK.
> > In f81604_probe() you call f81604_set_termination() before
> > register_candev(). If the device is not yet registered,
> > f81604_set_termination() can not yet be called via ip command. Can you
> > describe more precisely where you think there is a concurrency issue?
> > I still do not see it.
>
> Sorry, I had inverse the mean of ASSERT_RTNL(). It like you said.
>      f81604_probe() not held rtnl_lock.
>      ip set terminator will held rtnl_lock.
>
> Due to f81604_set_termination() will called by f81604_probe() to
> initialize, it may need mutex in
> situation as following:
>
> User is setting can0 terminator when f81604_probe() complete generate
> can0 and generating can1.
> So IMO, the mutex may needed.

Hmm, I am still not a fan of setting a mutex for a single concurrency
issue which can only happen during probing.

What about this:

  static int __f81604_set_termination(struct net_device *netdev, u16 term)
  {
          struct f81604_port_priv *port_priv = netdev_priv(netdev);
          u8 mask, data = 0;

          if (netdev->dev_id == 0)
                  mask = F81604_CAN0_TERM;
          else
                  mask = F81604_CAN1_TERM;

          if (term == F81604_TERMINATION_ENABLED)
                  data = mask;

          return f81604_mask_set_register(port_priv->dev, F81604_TERMINATOR_REG,
                                          mask, data);
  }

  static int f81604_set_termination(struct net_device *netdev, u16 term)
  {
          ASSERT_RTNL();

          return __f81604_set_termination(struct net_device *netdev, u16 term);
  }

  static int f81604_init_termination(struct f81604_priv *priv)
  {
          int i, ret;

          for (i = 0; i < ARRAY_SIZE(f81604_priv->netdev); i++) {
                  ret = __f81604_set_termination(f81604_priv->netdev[i],
                                                 F81604_TERMINATION_DISABLED);
                  if (ret)
                          return ret;
          }
  }

  static int f81604_probe(struct usb_interface *intf,
                          const struct usb_device_id *id)
  {
          /* ... */

          err = f81604_init_termination(priv);
          if (err)
                  goto failure_cleanup;

          for (i = 0; i < ARRAY_SIZE(f81604_priv->netdev); i++) {
                  /* ... */
          }

          /* ... */
  }

Initialise all resistors with __f81604_set_termination() in probe()
before registering any network device. Use f81604_set_termination()
which has the lock assert elsewhere.

Also, looking at your probe() function, in label clean_candev:, if the
second can channel fails its initialization, you do not clean the
first can channel. I suggest adding a f81604_init_netdev() and
handling the netdev issue and cleanup in that function.

> >>>> +               port_priv->can.do_get_berr_counter = f81604_get_berr_counter;
> >>>> +               port_priv->can.ctrlmode_supported =
> >>>> +                       CAN_CTRLMODE_LISTENONLY | CAN_CTRLMODE_3_SAMPLES |
> >>>> +                       CAN_CTRLMODE_ONE_SHOT | CAN_CTRLMODE_BERR_REPORTING |
> >>>> +                       CAN_CTRLMODE_CC_LEN8_DLC | CAN_CTRLMODE_PRESUME_ACK;
> >>> Did you test the CAN_CTRLMODE_CC_LEN8_DLC feature? Did you confirm
> >>> that you can send and receive DLC greater than 8?
> >> Sorry, I had misunderstand the define. This device is only support 0~8
> >> data length,
> >    ^^^^^^^^^^^
> >
> > Data length or Data Length Code (DLC)? Classical CAN maximum data
> > length is 8 but maximum DLC is 15 (and DLC 8 to 15 mean a data length
> > of 8).
> >
>
> This device can't support DLC > 8. It's only support 0~8.

Ack.