Re: [PATCH v2] can: kvaser_usb: Add support for Kvaser CAN/USB devices

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Oliver,

On 08/08/2012 08:14 AM, Olivier Sobrie wrote:
> Hi Wolfgang,
> 
> On Tue, Aug 07, 2012 at 08:26:38AM +0200, Wolfgang Grandegger wrote:
>> On 08/06/2012 07:21 AM, Olivier Sobrie wrote:
>>> This driver provides support for several Kvaser CAN/USB devices.
>>> Such kind of devices supports up to three can network interfaces.
>>
>> s/can/CAN/
>>
>>> It has been tested with a Kvaser USB Leaf Light (one network interface)
>>> connected to a pch_can interface.
>>> The firmware version of the Kvaser device was 2.5.205.
>>>
>>> List of Kvaser devices supported by the driver:
>>>   - Kvaser Leaf prototype (P010v2 and v3)
>>>   - Kvaser Leaf Light (P010v3)
>>>   - Kvaser Leaf Professional HS
>>>   - Kvaser Leaf SemiPro HS
>>>   - Kvaser Leaf Professional LS
>>>   - Kvaser Leaf Professional SWC
>>>   - Kvaser Leaf Professional LIN
>>>   - Kvaser Leaf SemiPro LS
>>>   - Kvaser Leaf SemiPro SWC
>>>   - Kvaser Memorator II, Prototype
>>>   - Kvaser Memorator II HS/HS
>>>   - Kvaser USBcan Professional HS/HS
>>>   - Kvaser Leaf Light GI
>>>   - Kvaser Leaf Professional HS (OBD-II connector)
>>>   - Kvaser Memorator Professional HS/LS
>>>   - Kvaser Leaf Light "China"
>>>   - Kvaser BlackBird SemiPro
>>>   - Kvaser OEM Mercury
>>>   - Kvaser OEM Leaf
>>>   - Kvaser USBcan R
>>
>> Impressive list! What CAN controllers are used inside the devices? SJA1000?
> 
> I took this list from the Kvaser driver. However I only have a Kvaser
> Leaf Light device thus I'm not sure it will work with other ones.
> If you prefer I can only let Kvaser Leaf Light instead of the full list.
> In my device it looks to be a Renesas M16C controller.

OK. Checking the manual, if available, could help to understand how the
firmware handles bus errors and state changes.

>>> Signed-off-by: Olivier Sobrie <olivier@xxxxxxxxx>
>>> ---
>>> Changes since v1:
>>>   - added copyrights
>>>   - kvaser_usb.h merged into kvaser.c
>>>   - added kvaser_usb_get_endpoints to find eindpoints instead of
>>>     hardcoding their address
>>>   - some cleanup and comestic changes
>>>   - fixed issues with errors handling
>>>   - fixed restart-ms == 0 case
>>>   - removed do_get_berr_counter method since the hardware doens't return
>>>     good values for txerr and rxerr.
>>>
>>> If someone in the linux-usb mailing can review it, it would be nice.
>>>
>>> Concerning the errors, it behaves like that now:
>>>
>>> 1) Short-circuit CAN-H and CAN-L and restart-ms = 0
>>>
>>> t0: short-circuit + 'cansend can1 123#112233'
>>>
>>>   can1  2000008C  [8] 00 0C 90 00 00 00 00 00   ERRORFRAME
>>> 	controller-problem{rx-error-warning,tx-error-warning}
>>> 	protocol-violation{{tx-recessive-bit-error,error-on-tx}{}}
>>> 	bus-error
>>>   can1  2000008C  [8] 00 30 90 00 00 00 00 00   ERRORFRAME
>>> 	controller-problem{rx-error-passive,tx-error-passive}
>>> 	protocol-violation{{tx-recessive-bit-error,error-on-tx}{}}
>>> 	bus-error
>>>   can1  200000C8  [8] 00 00 90 00 00 00 00 00   ERRORFRAME
>>> 	protocol-violation{{tx-recessive-bit-error,error-on-tx}{}}
>>> 	bus-off
>>> 	bus-error
>>>
>>> t1: remove short-circuit + 'ip link set can1 type can restart'
>>>
>>>   can1  20000100  [8] 00 00 00 00 00 00 00 00   ERRORFRAME
>>> 	restarted-after-bus-off
>>>   can1  20000004  [8] 00 0C 00 00 00 00 00 00   ERRORFRAME
>>> 	controller-problem{rx-error-warning,tx-error-warning}
>>
>> Why do we get the last error message? Maybe the firmware does it that
>> way (going down passive->warning->active).
> 
> It goes in that order: warning -> passive -> bus off -> warning
> -> passive -> ...

Just for curiosity? You don't see back to "error active"?

>>> 2) Short-circuit CAN-H and CAN-L and restart-ms = 100
>>>
>>> t0: short-circuit + cansend can1 123#112233
>>>
>>>   can1  2000008C  [8] 00 0C 90 00 00 00 00 00   ERRORFRAME
>>> 	controller-problem{rx-error-warning,tx-error-warning}
>>> 	protocol-violation{{tx-recessive-bit-error,error-on-tx}{}}
>>> 	bus-error
>>>   can1  2000008C  [8] 00 30 90 00 00 00 00 00   ERRORFRAME
>>> 	controller-problem{rx-error-passive,tx-error-passive}
>>> 	protocol-violation{{tx-recessive-bit-error,error-on-tx}{}}
>>> 	bus-error
>>>   can1  200000C8  [8] 00 00 90 00 00 00 00 00   ERRORFRAME
>>> 	protocol-violation{{tx-recessive-bit-error,error-on-tx}{}}
>>> 	bus-off
>>> 	bus-error
>>>   can1  2000018C  [8] 00 0C 90 00 00 00 00 00   ERRORFRAME
>>> 	controller-problem{rx-error-warning,tx-error-warning}
>>> 	protocol-violation{{tx-recessive-bit-error,error-on-tx}{}}
>>> 	bus-error
>>> 	restarted-after-bus-off
>>>   can1  2000008C  [8] 00 0C 90 00 00 00 00 00   ERRORFRAME
>>> 	controller-problem{rx-error-warning,tx-error-warning}
>>> 	protocol-violation{{tx-recessive-bit-error,error-on-tx}{}}
>>> 	bus-error
>>>   can1  2000008C  [8] 00 30 90 00 00 00 00 00   ERRORFRAME
>>> 	controller-problem{rx-error-passive,tx-error-passive}
>>> 	protocol-violation{{tx-recessive-bit-error,error-on-tx}{}}
>>> 	bus-error
>>>   can1  200000C8  [8] 00 00 90 00 00 00 00 00   ERRORFRAME
>>> 	protocol-violation{{tx-recessive-bit-error,error-on-tx}{}}
>>> 	bus-off
>>> 	bus-error
>>>   ...
>>>
>>>   can1  2000018C  [8] 00 0C 90 00 00 00 00 00   ERRORFRAME
>>> 	controller-problem{rx-error-warning,tx-error-warning}
>>> 	protocol-violation{{tx-recessive-bit-error,error-on-tx}{}}
>>> 	bus-error
>>> 	restarted-after-bus-off
>>>   can1  2000008C  [8] 00 0C 90 00 00 00 00 00   ERRORFRAME
>>> 	controller-problem{rx-error-warning,tx-error-warning}
>>> 	protocol-violation{{tx-recessive-bit-error,error-on-tx}{}}
>>> 	bus-error
>>>   can1  2000008C  [8] 00 30 90 00 00 00 00 00   ERRORFRAME
>>> 	controller-problem{rx-error-passive,tx-error-passive}
>>> 	protocol-violation{{tx-recessive-bit-error,error-on-tx}{}}
>>> 	bus-error
>>>
>>> t1: remove short-circuit
>>>
>>>   can1  123  [3] 11 22 33
>>>   can1  20000008  [8] 00 00 40 00 00 00 00 00   ERRORFRAME
>>> 	protocol-violation{{back-to-error-active}{}}
>>
>> The order is still inverted but likely the firmware is doing it that way.
> 
> Indeed the firmware does it that way: it sends the acknwledge of the
> frame beofre the state change. I can avoid that behavior by checking the
> state in the acknowledge frame and send the restart frame if the bus was
> off.

Well, if the firmware does it wrong, I would not really care. Also,
could you use timestamping to see if they come close together.

>>> 3) CAN-H and CAN-L disconnected
>>>
>>> t0: CAN-H and CAN-L disconnected + cansend can1 123#112233
>>>
>>>   can1  20000004  [8] 00 30 00 00 00 00 00 00   ERRORFRAME
>>> 	controller-problem{rx-error-passive,tx-error-passive}
>>>   can1  2000008C  [8] 00 30 80 1B 00 00 00 00   ERRORFRAME
>>> 	controller-problem{rx-error-passive,tx-error-passive}
>>> 	protocol-violation{{error-on-tx}{acknowledge-delimiter}}
>>> 	bus-error
>>>
>>> t1: CAN-H and CAN-L reconnected
>>>
>>>   can1  123  [3] 11 22 33
>>>   can1  20000004  [8] 00 30 00 00 00 00 00 00   ERRORFRAME
>>> 	controller-problem{rx-error-passive,tx-error-passive}
>>
>> Why do we get an error-passive message? Now I will have a closer look to
>> the code...
> 
> The firmware sends a CMD_CAN_ERROR_EVENT with the passive bit set...

Maybe the order is again inverted. Do they come at the same time
(visiable with candump -td).

>>> +static void kvaser_usb_rx_error(const struct kvaser_usb *dev,
>>> +				const struct kvaser_msg *msg)
>>> +{
>>> +	struct can_frame *cf;
>>> +	struct sk_buff *skb;
>>> +	struct net_device_stats *stats;
>>> +	struct kvaser_usb_net_priv *priv;
>>> +	unsigned int new_state;
>>> +	u8 channel, status;
>>> +
>>> +	if (msg->id == CMD_CAN_ERROR_EVENT) {
>>> +		channel = msg->u.error_event.channel;
>>> +		status =  msg->u.error_event.status;
>>> +	} else {
>>> +		channel = msg->u.chip_state_event.channel;
>>> +		status =  msg->u.chip_state_event.status;
>>> +	}
>>> +
>>> +	if (channel >= dev->nchannels) {
>>> +		dev_err(dev->udev->dev.parent,
>>> +			"Invalid channel number (%d)\n", channel);
>>> +		return;
>>> +	}
>>> +
>>> +	priv = dev->nets[channel];
>>> +	stats = &priv->netdev->stats;
>>> +
>>> +	if (status & M16C_STATE_BUS_RESET) {
>>> +		kvaser_usb_unlink_tx_urbs(priv);
>>> +		return;
>>> +	}
>>> +	skb = alloc_can_err_skb(priv->netdev, &cf);
>>> +	if (!skb) {
>>> +		stats->rx_dropped++;
>>> +		return;
>>
>> Cleanup? kvaser_usb_unlink_tx_urbs()?
> 
> If I get the M16C_STATE_BUS_RESET I'll not receive the ack frames anymore.
> I need to set the context->echo_index back to MAX_TX_URBS to not loose tx
> urbs.
> By the way I think a can_free_echo_skb() is missing...
> 
>>
>>> +	}
>>> +
>>> +	if (status & M16C_STATE_BUS_OFF) {
>>> +		cf->can_id |= CAN_ERR_BUSOFF;
>>> +
>>> +		if (!priv->can.restart_ms)
>>> +			kvaser_usb_simple_msg_async(priv, CMD_STOP_CHIP);
>>> +
>>> +		if (!priv->can.state != CAN_ERR_BUSOFF) {
>>> +			priv->can.can_stats.bus_off++;
>>> +			netif_carrier_off(priv->netdev);
>>> +		}
>>> +
>>> +		new_state = CAN_STATE_BUS_OFF;
>>> +	} else if (status & M16C_STATE_BUS_PASSIVE) {
>>> +		cf->can_id |= CAN_ERR_CRTL;
>>> +		cf->data[1] = CAN_ERR_CRTL_TX_PASSIVE |
>>> +			      CAN_ERR_CRTL_RX_PASSIVE;
>>
>> State changes should only be report when the state really changes.
>> Therefore it should go under the if block below.
> 
> Ok. Is it possible to get such sequence:
> can1  20000088  [8] 00 00 90 00 00 00 00 00   ERRORFRAME
> can1  20000088  [8] 00 10 90 00 00 00 00 00   ERRORFRAME
> can1  20000088  [8] 00 00 90 00 00 00 00 00   ERRORFRAME
> can1  20000088  [8] 00 00 90 00 00 00 00 00   ERRORFRAME
> can1  20000088  [8] 00 00 90 00 00 00 00 00   ERRORFRAME
> 
> 2 questions:
> 1) If the bus is still in CAN_ERR_CRTL_RX_PASSIVE after the second frame,
>    shouldn't we let the corresponding bit set, 0x10?

No, see below.

> 2) Can we send multiple times same frame wih same error bits set?

Yes, because that's what the hardware reports.

>>
>>> +		if (priv->can.state != CAN_STATE_ERROR_PASSIVE)
>>> +			priv->can.can_stats.error_passive++;
>>> +
>>> +		new_state = CAN_STATE_ERROR_PASSIVE;
>>> +	} else if (status & M16C_STATE_BUS_ERROR) {
>>
>> Hm, strange, a bus error is not a state change. You use here if...else
>> if... Isn't it possible that more than one bit is set.
> 
> Indeed it is possible to have multiple bits set.
> e.g. M16C_STATE_BUS_PASSIVE + M16C_STATE_BUS_ERROR or M16C_STATE_BUS_OFF + M16C_STATE_BUS_ERROR.

OK, that's the normal behaviour. Obviously they send bus errors together
with the *actual* state. The hardware does usuallly report bus errors
frequently while the error condition persists.

> What error should I report in case of M16C_STATE_BUS_ERROR?

To make that clear, I have added an (old) output from the SJA1000, which
is the defacto reference. Bus error reporting is enabled and no cable is
connected. Watch the TX error count increasing and how the state changes:

  $ ./candump -e 0xffff any
  can0  20000088  [8] 00 00 80 19 00 08 00 00   ERRORFRAME
               \             \  \-- ACK slot.
                \             \-- error occured on transmission
                 \-- Bus-error | Protocol violations (data[2], data[3]).

  can0  20000088  [8] 00 00 80 19 00 10 00 00   ERRORFRAME
  can0  20000088  [8] 00 00 80 19 00 18 00 00   ERRORFRAME
  can0  20000088  [8] 00 00 80 19 00 20 00 00   ERRORFRAME
  can0  20000088  [8] 00 00 80 19 00 28 00 00   ERRORFRAME
  can0  20000088  [8] 00 00 80 19 00 30 00 00   ERRORFRAME
  can0  20000088  [8] 00 00 80 19 00 38 00 00   ERRORFRAME
  can0  20000088  [8] 00 00 80 19 00 40 00 00   ERRORFRAME
  can0  20000088  [8] 00 00 80 19 00 48 00 00   ERRORFRAME
  can0  20000088  [8] 00 00 80 19 00 50 00 00   ERRORFRAME
  can0  20000088  [8] 00 00 80 19 00 58 00 00   ERRORFRAME
  can0  2000008C  [8] 00 08 80 19 00 60 00 00   ERRORFRAME
               \          \--  reached warning level for TX errors
                \-- | Controller problems (see data[1]).

  can0  20000088  [8] 00 00 80 19 00 68 00 00   ERRORFRAME
  can0  20000088  [8] 00 00 80 19 00 70 00 00   ERRORFRAME
  can0  20000088  [8] 00 00 80 19 00 78 00 00   ERRORFRAME
  can0  2000008C  [8] 00 20 80 19 00 80 00 00   ERRORFRAME
               \          \--   reached passive level for TX errors
                \-- | Controller problems (see data[1]).

  can0  20000088  [8] 00 00 80 19 00 80 00 00   ERRORFRAME
                                      \  \-- RXerror count
                                       \-- TXerror count

  can0  20000088  [8] 00 00 80 19 00 80 00 00   ERRORFRAME
  ...
  can0  20000088  [8] 00 00 80 19 00 80 00 00   ERRORFRAME


> 
>>
>>> +		cf->can_id |= CAN_ERR_CRTL;
>>> +		cf->data[1] = CAN_ERR_CRTL_TX_WARNING |
>>> +			      CAN_ERR_CRTL_RX_WARNING;
>>
>> See above.
>>
>>> +		if (priv->can.state != CAN_STATE_ERROR_WARNING)
>>> +			priv->can.can_stats.error_warning++;
>>> +
>>> +		new_state = CAN_STATE_ERROR_WARNING;
>>> +	} else {
>>> +		cf->can_id |= CAN_ERR_PROT;
>>> +		cf->data[2] = CAN_ERR_PROT_ACTIVE;
>>> +
>>> +		new_state = CAN_STATE_ERROR_ACTIVE;
>>> +	}
>>> +
>>> +	if (priv->can.restart_ms &&
>>> +	    (priv->can.state >= CAN_STATE_BUS_OFF) &&
>>> +	    (new_state < CAN_STATE_BUS_OFF)) {
>>> +		cf->can_id |= CAN_ERR_RESTARTED;
>>> +		priv->can.can_stats.restarts++;
>>> +		netif_carrier_on(priv->netdev);
>>> +	}
>>> +
>>> +	if (msg->id == CMD_CAN_ERROR_EVENT) {
>>> +		u8 error_factor = msg->u.error_event.error_factor;
>>> +
>>> +		priv->can.can_stats.bus_error++;
>>> +		stats->rx_errors++;
>>> +
>>> +		cf->can_id |= CAN_ERR_PROT | CAN_ERR_BUSERROR;
>>> +
>>> +		if (error_factor & M16C_EF_ACKE)
>>> +			cf->data[3] |= (CAN_ERR_PROT_LOC_ACK |
>>> +					CAN_ERR_PROT_LOC_ACK_DEL);
>>> +		if (error_factor & M16C_EF_CRCE)
>>> +			cf->data[3] |= (CAN_ERR_PROT_LOC_CRC_SEQ |
>>> +					CAN_ERR_PROT_LOC_CRC_DEL);
>>> +		if (error_factor & M16C_EF_FORME)
>>> +			cf->data[2] |= CAN_ERR_PROT_FORM;
>>> +		if (error_factor & M16C_EF_STFE)
>>> +			cf->data[2] |= CAN_ERR_PROT_STUFF;
>>> +		if (error_factor & M16C_EF_BITE0)
>>> +			cf->data[2] |= CAN_ERR_PROT_BIT0;
>>> +		if (error_factor & M16C_EF_BITE1)
>>> +			cf->data[2] |= CAN_ERR_PROT_BIT1;
>>> +		if (error_factor & M16C_EF_TRE)
>>> +			cf->data[2] |= CAN_ERR_PROT_TX;
>>> +	}
>>> +
>>> +	priv->can.state = new_state;
>>> +
>>> +	if (!memcmp(cf, &priv->cf_err_old, sizeof(*cf))) {
>>> +		kfree_skb(skb);
>>> +		return;
>>> +	}
>>
>> Hm, the firmware seems not to clear error conditions? Anyway, state
>> change and error reporting is magic on many CAN controllers. Just the
>> SJA1000 is doing it nicely.
> 
> I added it to prevent sending two times the same error. It happens that
> the firmware sends multiple times the same error message.

Can then be removed, I think, see above.

>>
>>> +	netif_rx(skb);
>>> +
>>> +	priv->cf_err_old = *cf;
>>> +
>>> +	stats->rx_packets++;
>>> +	stats->rx_bytes += cf->can_dlc;
>>> +}
>>> +
>>> +static void kvaser_usb_rx_can_msg(const struct kvaser_usb *dev,
>>> +				  const struct kvaser_msg *msg)
>>> +{
>>> +	struct kvaser_usb_net_priv *priv;
>>> +	struct can_frame *cf;
>>> +	struct sk_buff *skb;
>>> +	struct net_device_stats *stats;
>>> +	u8 channel = msg->u.rx_can.channel;
>>> +
>>> +	if (channel >= dev->nchannels) {
>>> +		dev_err(dev->udev->dev.parent,
>>> +			"Invalid channel number (%d)\n", channel);
>>> +		return;
>>> +	}
>>> +
>>> +	priv = dev->nets[channel];
>>> +	stats = &priv->netdev->stats;
>>> +
>>> +	skb = alloc_can_skb(priv->netdev, &cf);
>>> +	if (skb == NULL) {
>>
>> s/skb == NULL)/!skb/ ?
>>
>>> +		stats->tx_dropped++;
>>> +		return;
>>> +	}
>>> +
>>> +	cf->can_id = ((msg->u.rx_can.msg[0] & 0x1f) << 6) |
>>> +		     (msg->u.rx_can.msg[1] & 0x3f);
>>> +	cf->can_dlc = get_can_dlc(msg->u.rx_can.msg[5]);
>>> +
>>> +	if (msg->id == CMD_RX_EXT_MESSAGE) {
>>> +		cf->can_id <<= 18;
>>> +		cf->can_id |= ((msg->u.rx_can.msg[2] & 0x0f) << 14) |
>>> +			      ((msg->u.rx_can.msg[3] & 0xff) << 6) |
>>> +			      (msg->u.rx_can.msg[4] & 0x3f);
>>> +		cf->can_id |= CAN_EFF_FLAG;
>>> +	}
>>> +
>>> +	if (msg->u.rx_can.flag & MSG_FLAG_REMOTE_FRAME) {
>>> +		cf->can_id |= CAN_RTR_FLAG;
>>> +	} else if (msg->u.rx_can.flag & (MSG_FLAG_ERROR_FRAME |
>>> +					 MSG_FLAG_NERR)) {
>>> +		cf->can_id |= CAN_ERR_FLAG;
>>> +		cf->can_dlc = CAN_ERR_DLC;
>>> +		cf->data[1] = CAN_ERR_CRTL_UNSPEC;
>>
>> What is the meaning of such errors? A comment, netdev_err() or
>> netdev_dbg() would be nice.
> 
> I never reached this error with the hardware I've... I don't know what's
> the meaning of this flag... I will add a trace.

Then add a netdev_err().

Wolfgang.

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux