Re: [PATCH v13 01/11] can: etas_es58x: add core support for ETAS ES58X CAN USB interfaces

Marc Kleine-Budde <mkl@xxxxxxxxxxxxxx> · Mon, 29 Mar 2021 11:18:42 +0200

On 20.03.2021 19:33:02, Vincent MAILHOL wrote:
> > > +/**
> > > + * es58x_set_skb_timestamp() - Set the hardware timestamp of an skb.
> > > + * @netdev: CAN network device.
> > > + * @skb: socket buffer of a CAN message.
> > > + * @timestamp: Timestamp received from an ES58X device.
> > > + *
> > > + * Used for both received and loopback messages.
> > > + *
> > > + * Return: zero on success, -EFAULT if @skb is NULL.
> >
> > That's not correct.
> >
> > Please make sure to call it with skb != NULL and make it a void function.
> 
> Ack.
> All calls to that function already checked that the skb was not NULL.
> I will make the function void.
> 
> > Does it make sense for you to use of struct cyclecounter/struct timecounter.
> > Have a look at:
> >
> > https://lore.kernel.org/linux-can/20210304160328.2752293-6-mkl@xxxxxxxxxxxxxx/
> >
> > As your device already provides a u64 timestamp you don't need the worker.
> 
> I saw your patch but because my device already uses a 64 bit
> counter, I did not see a benefit to do so.
> 
> Which problem would the struct cyclecounter/struct timecounter
> solve for my driver?

It probably doesn't matter.

> 
> > > + */
> > > +static int es58x_set_skb_timestamp(struct net_device *netdev,
> > > +                                struct sk_buff *skb, u64 timestamp)
> > > +{
> > > +     struct es58x_device *es58x_dev = es58x_priv(netdev)->es58x_dev;
> > > +     struct skb_shared_hwtstamps *hwts;
> > > +
> > > +     hwts = skb_hwtstamps(skb);
> > > +     /* Ignoring overflow (overflow on 64 bits timestamp with nano
> > > +      * second precision would occur after more than 500 years).
> > > +      */
> > > +     hwts->hwtstamp = ns_to_ktime(es58x_timestamp_to_ns(timestamp) +
> > > +                                  es58x_dev->realtime_diff_ns);
> > > +
> > > +     return 0;
> > > +}
> > > +
> > > +/**
> > > + * es58x_rx_timestamp() - Handle a received timestamp.
> > > + * @es58x_dev: ES58X device.
> > > + * @timestamp: Timestamp received from a ES58X device.
> > > + *
> > > + * Calculate the difference between the ES58X device and the kernel
> > > + * internal clocks. This difference will be later used as an offset to
> > > + * convert the timestamps of RX and loopback messages to match the
> > > + * kernel system time (e.g. convert to UNIX time).
> >
> > I'm not sure if we want to have the timestamp based on the kernel system time.
> > The problem I see is that your clock is not synchronized to your system clock
> > and will probably drift, so you might accumulate an offset.
> >
> > When looking at candump output, a timestamp that more or less current time gives
> > a false sense of correctness, but if the timestamp is short after 01.0.1970 you
> > don't expect it to be correct.
> >
> > But this is a different problem.....
> 
> Here, we are discussing tastes and colors. I am perfectly aware
> that the two quartz used by the device and the kernel are not in
> sync and will drift away from each other. However, when looking
> at my traces, I like to have a hint of when an event occurs. In
> the worst case, the quartz might drift away up to four seconds a day,
> but the date, the hours and the minutes stay accurate.
> 
> So it is a discussion of making it convenient versus making it dummy proof.
> 
> If there is a strong consensus not to base the hardware timestamp
> on the kernel, then I will remove it. But if I have the choice, I
> prefer to keep it as it is.

I've changed the mcp251xfd timestamp to sync to the kernel time, too :)

[...]

> > > +/**
> > > + * es58x_rx_err_msg() - Handle a received CAN event or error message.
> > > + * @netdev: CAN network device.
> > > + * @error: Error code.
> > > + * @event: Event code.
> > > + * @timestamp: Timestamp received from a ES58X device.
> > > + *
> > > + * Handle the errors and events received by the ES58X device, create
> > > + * a CAN error skb and post it.
> > > + *
> > > + * In some rare cases the devices might get stucked alternating between
> > > + * CAN_STATE_ERROR_PASSIVE and CAN_STATE_ERROR_WARNING. To prevent
> > > + * this behavior, we force a bus off state if the device goes in
> > > + * CAN_STATE_ERROR_WARNING for ES58X_MAX_CONSECUTIVE_WARN consecutive
> > > + * times with no successful transmission or reception in between.
> > > + *
> > > + * Once the device is in bus off state, the only way to restart it is
> > > + * through the drivers/net/can/dev.c:can_restart() function. The
> > > + * device is technically capable to recover by itself under certain
> > > + * circumstances, however, allowing self recovery would create
> > > + * complex race conditions with drivers/net/can/dev.c:can_restart()
> > > + * and thus was not implemented. To activate automatic restart, please
> > > + * set the restart-ms parameter (e.g. ip link set can0 type can
> > > + * restart-ms 100).
> > > + *
> > > + * If the bus is really instable, this function would try to send a
> > > + * lot of log messages. Those are rate limited (i.e. you will see
> > > + * messages such as "net_ratelimit: XXX callbacks suppressed" in
> > > + * dmesg).
> > > + *
> > > + * Return: zero on success, errno when any error occurs.
> > > + */
> > > +int es58x_rx_err_msg(struct net_device *netdev, enum es58x_err error,
> > > +                  enum es58x_event event, u64 timestamp)
> > > +{
[...]
> > > +     switch (event) {
> > > +     case ES58X_EVENT_OK:    /* 0: No event */
> > > +             break;
> > > +
> > > +     case ES58X_EVENT_CRTL_ACTIVE:
> > > +             if (can->state == CAN_STATE_BUS_OFF) {
> > > +                     netdev_err(netdev,
> > > +                                "%s: state transition: BUS OFF -> ACTIVE\n",
> > > +                                __func__);
> >
> > Your device should not go autoamtically from bus off to active. Only when it's
> > restarted by the kernel (or the user).
> 
> Agree, this is why it is an error message.
> 
> CAN controllers are allowed to automatically recover under very
> specific conditions (c.f. ISO 11898-1, section 10.9.4 "Bus
> integration state": "Nodes that are in bus-off state shall
> re-enter the bus integration state immediately after detecting
> the idle condition if the bus-off recovery condition is not yet
> met").

On Linux the controllers should not recover automatically, but
controlled by the kernel as configured by restart-ms. So shut down the
controller in case of a bus off and wait for the kernel to recover via
priv->can.do_set_mode(CAN_MODE_START).

> Aside from the idle condition, I do not think that the nodes are
> allowed to self-recover.

ACK

> But because the can_restart() function does not use locks for the
> echo_skb[], I treat this scenario by printing an error message and
> ignoring it.

regards,
Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |
Attachment:
signature.asc

Description: PGP signature