RE: [RESEND] [PATCH 1/1] can: m_can: Control tx and rx flow to avoid communication stall

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Marc,

Thank you for the review.

> -----Original Message-----
> From: Marc Kleine-Budde <mkl@xxxxxxxxxxxxxx>
> Sent: Wednesday, July 5, 2023 5:47 PM
> To: Kumari Pallavi <kumari.pallavi@xxxxxxxxx>
> Cc: rcsekar@xxxxxxxxxxx; Sangannavar, Mallikarjunappa
> <mallikarjunappa.sangannavar@xxxxxxxxx>; Nikula, Jarkko
> <jarkko.nikula@xxxxxxxxx>; linux-can@xxxxxxxxxxxxxxx;
> netdev@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Thokala, Srikanth
> <srikanth.thokala@xxxxxxxxx>
> Subject: Re: [RESEND] [PATCH 1/1] can: m_can: Control tx and rx flow to avoid
> communication stall
> 
> On 23.06.2023 14:29:20, Kumari Pallavi wrote:
> > In bi-directional CAN transfer using M_CAN IP, with the frame gap
> > being set to '0', it leads to Protocol error in Arbitration phase
> > resulting in communication stall.
> 
> Is there a (public) erratum describing the problem?

No, there is no any public erratum available. I found this issue during stress test.

> 
> > Discussed with Bosch M_CAN IP team and the stall issue can only be
> > overcome by controlling the tx and rx packets flow as done by the
> > patch.
> 
> Please elaborate the suggested workaround.

Sure.

> 
> > Rx packets would also be serviced when there is a tx interrupt. The
> > solution has been tested extensively for more than 10 days, and no
> > issues has been observed.
> 
> Can you describe how your patch implements the workaround?
> 
> | Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
> | instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
> | to do frotz", as if you are giving orders to the codebase to change
> | its behaviour.
> 
> See:
> https://github.com/torvalds/linux/blob/master/Documentation/process/submit
> ting-patches.rst#describe-your-changes
> 

Sure, I will reformat in next version.

> > Setup that is used to reproduce the issue:
> >
> > +---------------------+		+----------------------+
> > |Intel ElkhartLake    |		|Intel ElkhartLake     |
> > |	+--------+    |		|	+--------+     |
> > |	|m_can 0 |    |<=======>|	|m_can 0 |     |
> > |	+--------+    |		|	+--------+     |
> > +---------------------+		+----------------------+
> >
> > Steps to be run on the two Elkhartlake HW:
> >
> > 1. ip link set can0 type can bitrate 1000000 2. ip link set can0
> > txqueuelen 2048 3. ip link set can0 up 4. cangen -g 0 can0 5. candump
> > can0
> >
> > cangen -g 0 can0 & candump can0 commands are used for transmit and
> > receive on both the m_can HW simultaneously where -g is the frame gap
> > between two frames.
> >
> > Signed-off-by: Kumari Pallavi <kumari.pallavi@xxxxxxxxx>
> > Signed-off-by: Srikanth Thokala <srikanth.thokala@xxxxxxxxx>
> > ---
> >  drivers/net/can/m_can/m_can.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/can/m_can/m_can.c
> > b/drivers/net/can/m_can/m_can.c index a5003435802b..94aa0ba89202
> > 100644
> > --- a/drivers/net/can/m_can/m_can.c
> > +++ b/drivers/net/can/m_can/m_can.c
> > @@ -1118,7 +1118,7 @@ static irqreturn_t m_can_isr(int irq, void *dev_id)
> >  			/* New TX FIFO Element arrived */
> >  			if (m_can_echo_tx_event(dev) != 0)
> >  				goto out_fail;
> > -
> 
> nitpick: please keep that empty line.

Sure.

> 
> > +			m_can_write(cdev, M_CAN_IE, IR_ALL_INT &
> ~(IR_TEFN));
> 
> - What's the purpose of  "()" around IR_TEFN?
> - You enable a lot of interrupts that have not been enabled before. Have
>   a look at m_can_chip_config() how the original register value for
>   M_CAN_IE is calculated.
> 

Sure, I will recheck and fix as needed.

> >  			if (netif_queue_stopped(dev) &&
> >  			    !m_can_tx_fifo_full(cdev))
> >  				netif_wake_queue(dev);
> > @@ -1787,6 +1787,7 @@ static netdev_tx_t m_can_start_xmit(struct sk_buff
> *skb,
> >  		}
> >  	} else {
> >  		cdev->tx_skb = skb;
> > +		m_can_write(cdev, M_CAN_IE, IR_ALL_INT & (IR_TEFN));
> 
> - What's the purpose of  "()" around IR_TEFN?
> - "IR_ALL_INT & (IR_TEFN)" is equal to IR_TEFN, isn't it?
> - This basically disables all other interrupts, is this what you want to
>   do?
> - What happens if the bus is busy with high prio CAN frames and you want
>   to send low prio ones? You will not get any RX-IRQ, this doesn't look
>   correct to me.
> 

Even though the RX interrupt is disabled (in IE), if there is an TX interrupt and the RF0N bit is set (in IR), the RX packet will still be serviced because the TX and RX share the same IRQ handler. 

> >  		return m_can_tx_handler(cdev);
> >  	}
> >
> > --
> > 2.17.1
> >
> >
> 
> Marc
> 
> --
> Pengutronix e.K.                 | Marc Kleine-Budde          |
> Embedded Linux                   | https://www.pengutronix.de |
> Vertretung Nürnberg              | Phone: +49-5121-206917-129 |
> Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-9   |

Thanks, 
Pallavi




[Index of Archives]     [Automotive Discussions]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [CAN Bus]

  Powered by Linux